computer networking lent term m/w/f 11-midday lt1 in gates building slide set 5 andrew w. moore

119
Computer Networking Lent Term M/W/F 11-midday LT1 in Gates Building Slide Set 5 Andrew W. Moore andrew.moore@ cl.cam.ac.uk January 2013 1

Upload: helen

Post on 23-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Computer Networking Lent Term M/W/F 11-midday LT1 in Gates Building Slide Set 5 Andrew W. Moore andrew.moore@ cl.cam.ac.uk January 2013. Topic 5 – Transport. learn about transport layer protocols in the Internet: UDP: connectionless transport TCP: connection-oriented transport - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

1

Computer Networking

Lent Term MWF 11-middayLT1 in Gates Building

Slide Set 5

Andrew W Mooreandrewmooreclcamacuk

January 2013

2

Topic 5 ndash TransportOur goals bull understand principles

behind transport layer servicesndash multiplexing

demultiplexingndash reliable data transferndash flow controlndash congestion control

bull learn about transport layer protocols in the Internetndash UDP connectionless transportndash TCP connection-oriented

transportndash TCP congestion control

3

Transport services and protocolsbull provide logical communication

between app processes running on different hosts

bull transport protocols run in end systems ndash send side breaks app

messages into segments passes to network layer

ndash rcv side reassembles segments into messages passes to app layer

bull more than one transport protocol available to appsndash Internet TCP and UDP

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

4

Transport vs network layerbull network layer logical communication between hostsbull transport layer logical communication between processes

ndash relies on enhances network layer services

Household analogy12 kids sending letters to 12

kidsbull processes = kidsbull app messages = letters in

envelopesbull hosts = housesbull transport protocol = Ann

and Billbull network-layer protocol =

postal service

5

Internet transport-layer protocols

bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup

bull unreliable unordered delivery UDPndash no-frills extension of ldquobest-

effortrdquo IPbull services not available

ndash delay guaranteesndash bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

6

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Multiplexingdemultiplexing(Transport-layer style)

7

How transport-layer demultiplexing worksbull host receives IP datagrams

ndash each datagram has source IP address destination IP address

ndash each datagram carries 1 transport-layer segment

ndash each segment has source destination port number

bull host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

8

Connectionless demultiplexingbull Create sockets with port

numbersDatagramSocket mySocket1 = new

DatagramSocket(12534)DatagramSocket mySocket2 = new

DatagramSocket(12535)

bull UDP socket identified by two-tuple

(dest IP address dest port number)

bull When host receives UDP segmentndash checks destination port

number in segmentndash directs UDP segment to socket

with that port numberbull IP datagrams with different

source IP addresses andor source port numbers directed to same socket

9

Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428DP 9157

SP 9157DP 6428

SP 6428DP 5775

SP 5775DP 6428

SP provides ldquoreturn addressrdquo

10

Connection-oriented demuxbull TCP socket identified by 4-

tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number

bull recv host uses all four values to direct segment to appropriate socket

bull Server host may support many simultaneous TCP socketsndash each socket identified by its

own 4-tuplebull Web servers have different

sockets for each connecting clientndash non-persistent HTTP will have

different socket for each request

11

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157DP 80

SP 9157DP 80

P5 P6 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

12

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157DP 80

SP 9157DP 80

P4 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 2: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

2

Topic 5 ndash TransportOur goals bull understand principles

behind transport layer servicesndash multiplexing

demultiplexingndash reliable data transferndash flow controlndash congestion control

bull learn about transport layer protocols in the Internetndash UDP connectionless transportndash TCP connection-oriented

transportndash TCP congestion control

3

Transport services and protocolsbull provide logical communication

between app processes running on different hosts

bull transport protocols run in end systems ndash send side breaks app

messages into segments passes to network layer

ndash rcv side reassembles segments into messages passes to app layer

bull more than one transport protocol available to appsndash Internet TCP and UDP

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

4

Transport vs network layerbull network layer logical communication between hostsbull transport layer logical communication between processes

ndash relies on enhances network layer services

Household analogy12 kids sending letters to 12

kidsbull processes = kidsbull app messages = letters in

envelopesbull hosts = housesbull transport protocol = Ann

and Billbull network-layer protocol =

postal service

5

Internet transport-layer protocols

bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup

bull unreliable unordered delivery UDPndash no-frills extension of ldquobest-

effortrdquo IPbull services not available

ndash delay guaranteesndash bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

6

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Multiplexingdemultiplexing(Transport-layer style)

7

How transport-layer demultiplexing worksbull host receives IP datagrams

ndash each datagram has source IP address destination IP address

ndash each datagram carries 1 transport-layer segment

ndash each segment has source destination port number

bull host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

8

Connectionless demultiplexingbull Create sockets with port

numbersDatagramSocket mySocket1 = new

DatagramSocket(12534)DatagramSocket mySocket2 = new

DatagramSocket(12535)

bull UDP socket identified by two-tuple

(dest IP address dest port number)

bull When host receives UDP segmentndash checks destination port

number in segmentndash directs UDP segment to socket

with that port numberbull IP datagrams with different

source IP addresses andor source port numbers directed to same socket

9

Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428DP 9157

SP 9157DP 6428

SP 6428DP 5775

SP 5775DP 6428

SP provides ldquoreturn addressrdquo

10

Connection-oriented demuxbull TCP socket identified by 4-

tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number

bull recv host uses all four values to direct segment to appropriate socket

bull Server host may support many simultaneous TCP socketsndash each socket identified by its

own 4-tuplebull Web servers have different

sockets for each connecting clientndash non-persistent HTTP will have

different socket for each request

11

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157DP 80

SP 9157DP 80

P5 P6 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

12

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157DP 80

SP 9157DP 80

P4 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 3: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

3

Transport services and protocolsbull provide logical communication

between app processes running on different hosts

bull transport protocols run in end systems ndash send side breaks app

messages into segments passes to network layer

ndash rcv side reassembles segments into messages passes to app layer

bull more than one transport protocol available to appsndash Internet TCP and UDP

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

4

Transport vs network layerbull network layer logical communication between hostsbull transport layer logical communication between processes

ndash relies on enhances network layer services

Household analogy12 kids sending letters to 12

kidsbull processes = kidsbull app messages = letters in

envelopesbull hosts = housesbull transport protocol = Ann

and Billbull network-layer protocol =

postal service

5

Internet transport-layer protocols

bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup

bull unreliable unordered delivery UDPndash no-frills extension of ldquobest-

effortrdquo IPbull services not available

ndash delay guaranteesndash bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

6

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Multiplexingdemultiplexing(Transport-layer style)

7

How transport-layer demultiplexing worksbull host receives IP datagrams

ndash each datagram has source IP address destination IP address

ndash each datagram carries 1 transport-layer segment

ndash each segment has source destination port number

bull host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

8

Connectionless demultiplexingbull Create sockets with port

numbersDatagramSocket mySocket1 = new

DatagramSocket(12534)DatagramSocket mySocket2 = new

DatagramSocket(12535)

bull UDP socket identified by two-tuple

(dest IP address dest port number)

bull When host receives UDP segmentndash checks destination port

number in segmentndash directs UDP segment to socket

with that port numberbull IP datagrams with different

source IP addresses andor source port numbers directed to same socket

9

Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428DP 9157

SP 9157DP 6428

SP 6428DP 5775

SP 5775DP 6428

SP provides ldquoreturn addressrdquo

10

Connection-oriented demuxbull TCP socket identified by 4-

tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number

bull recv host uses all four values to direct segment to appropriate socket

bull Server host may support many simultaneous TCP socketsndash each socket identified by its

own 4-tuplebull Web servers have different

sockets for each connecting clientndash non-persistent HTTP will have

different socket for each request

11

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157DP 80

SP 9157DP 80

P5 P6 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

12

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157DP 80

SP 9157DP 80

P4 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 4: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

4

Transport vs network layerbull network layer logical communication between hostsbull transport layer logical communication between processes

ndash relies on enhances network layer services

Household analogy12 kids sending letters to 12

kidsbull processes = kidsbull app messages = letters in

envelopesbull hosts = housesbull transport protocol = Ann

and Billbull network-layer protocol =

postal service

5

Internet transport-layer protocols

bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup

bull unreliable unordered delivery UDPndash no-frills extension of ldquobest-

effortrdquo IPbull services not available

ndash delay guaranteesndash bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

6

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Multiplexingdemultiplexing(Transport-layer style)

7

How transport-layer demultiplexing worksbull host receives IP datagrams

ndash each datagram has source IP address destination IP address

ndash each datagram carries 1 transport-layer segment

ndash each segment has source destination port number

bull host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

8

Connectionless demultiplexingbull Create sockets with port

numbersDatagramSocket mySocket1 = new

DatagramSocket(12534)DatagramSocket mySocket2 = new

DatagramSocket(12535)

bull UDP socket identified by two-tuple

(dest IP address dest port number)

bull When host receives UDP segmentndash checks destination port

number in segmentndash directs UDP segment to socket

with that port numberbull IP datagrams with different

source IP addresses andor source port numbers directed to same socket

9

Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428DP 9157

SP 9157DP 6428

SP 6428DP 5775

SP 5775DP 6428

SP provides ldquoreturn addressrdquo

10

Connection-oriented demuxbull TCP socket identified by 4-

tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number

bull recv host uses all four values to direct segment to appropriate socket

bull Server host may support many simultaneous TCP socketsndash each socket identified by its

own 4-tuplebull Web servers have different

sockets for each connecting clientndash non-persistent HTTP will have

different socket for each request

11

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157DP 80

SP 9157DP 80

P5 P6 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

12

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157DP 80

SP 9157DP 80

P4 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 5: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

5

Internet transport-layer protocols

bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup

bull unreliable unordered delivery UDPndash no-frills extension of ldquobest-

effortrdquo IPbull services not available

ndash delay guaranteesndash bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

6

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Multiplexingdemultiplexing(Transport-layer style)

7

How transport-layer demultiplexing worksbull host receives IP datagrams

ndash each datagram has source IP address destination IP address

ndash each datagram carries 1 transport-layer segment

ndash each segment has source destination port number

bull host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

8

Connectionless demultiplexingbull Create sockets with port

numbersDatagramSocket mySocket1 = new

DatagramSocket(12534)DatagramSocket mySocket2 = new

DatagramSocket(12535)

bull UDP socket identified by two-tuple

(dest IP address dest port number)

bull When host receives UDP segmentndash checks destination port

number in segmentndash directs UDP segment to socket

with that port numberbull IP datagrams with different

source IP addresses andor source port numbers directed to same socket

9

Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428DP 9157

SP 9157DP 6428

SP 6428DP 5775

SP 5775DP 6428

SP provides ldquoreturn addressrdquo

10

Connection-oriented demuxbull TCP socket identified by 4-

tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number

bull recv host uses all four values to direct segment to appropriate socket

bull Server host may support many simultaneous TCP socketsndash each socket identified by its

own 4-tuplebull Web servers have different

sockets for each connecting clientndash non-persistent HTTP will have

different socket for each request

11

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157DP 80

SP 9157DP 80

P5 P6 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

12

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157DP 80

SP 9157DP 80

P4 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 6: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

6

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Multiplexingdemultiplexing(Transport-layer style)

7

How transport-layer demultiplexing worksbull host receives IP datagrams

ndash each datagram has source IP address destination IP address

ndash each datagram carries 1 transport-layer segment

ndash each segment has source destination port number

bull host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

8

Connectionless demultiplexingbull Create sockets with port

numbersDatagramSocket mySocket1 = new

DatagramSocket(12534)DatagramSocket mySocket2 = new

DatagramSocket(12535)

bull UDP socket identified by two-tuple

(dest IP address dest port number)

bull When host receives UDP segmentndash checks destination port

number in segmentndash directs UDP segment to socket

with that port numberbull IP datagrams with different

source IP addresses andor source port numbers directed to same socket

9

Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428DP 9157

SP 9157DP 6428

SP 6428DP 5775

SP 5775DP 6428

SP provides ldquoreturn addressrdquo

10

Connection-oriented demuxbull TCP socket identified by 4-

tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number

bull recv host uses all four values to direct segment to appropriate socket

bull Server host may support many simultaneous TCP socketsndash each socket identified by its

own 4-tuplebull Web servers have different

sockets for each connecting clientndash non-persistent HTTP will have

different socket for each request

11

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157DP 80

SP 9157DP 80

P5 P6 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

12

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157DP 80

SP 9157DP 80

P4 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 7: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

7

How transport-layer demultiplexing worksbull host receives IP datagrams

ndash each datagram has source IP address destination IP address

ndash each datagram carries 1 transport-layer segment

ndash each segment has source destination port number

bull host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

8

Connectionless demultiplexingbull Create sockets with port

numbersDatagramSocket mySocket1 = new

DatagramSocket(12534)DatagramSocket mySocket2 = new

DatagramSocket(12535)

bull UDP socket identified by two-tuple

(dest IP address dest port number)

bull When host receives UDP segmentndash checks destination port

number in segmentndash directs UDP segment to socket

with that port numberbull IP datagrams with different

source IP addresses andor source port numbers directed to same socket

9

Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428DP 9157

SP 9157DP 6428

SP 6428DP 5775

SP 5775DP 6428

SP provides ldquoreturn addressrdquo

10

Connection-oriented demuxbull TCP socket identified by 4-

tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number

bull recv host uses all four values to direct segment to appropriate socket

bull Server host may support many simultaneous TCP socketsndash each socket identified by its

own 4-tuplebull Web servers have different

sockets for each connecting clientndash non-persistent HTTP will have

different socket for each request

11

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157DP 80

SP 9157DP 80

P5 P6 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

12

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157DP 80

SP 9157DP 80

P4 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 8: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

8

Connectionless demultiplexingbull Create sockets with port

numbersDatagramSocket mySocket1 = new

DatagramSocket(12534)DatagramSocket mySocket2 = new

DatagramSocket(12535)

bull UDP socket identified by two-tuple

(dest IP address dest port number)

bull When host receives UDP segmentndash checks destination port

number in segmentndash directs UDP segment to socket

with that port numberbull IP datagrams with different

source IP addresses andor source port numbers directed to same socket

9

Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428DP 9157

SP 9157DP 6428

SP 6428DP 5775

SP 5775DP 6428

SP provides ldquoreturn addressrdquo

10

Connection-oriented demuxbull TCP socket identified by 4-

tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number

bull recv host uses all four values to direct segment to appropriate socket

bull Server host may support many simultaneous TCP socketsndash each socket identified by its

own 4-tuplebull Web servers have different

sockets for each connecting clientndash non-persistent HTTP will have

different socket for each request

11

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157DP 80

SP 9157DP 80

P5 P6 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

12

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157DP 80

SP 9157DP 80

P4 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 9: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

9

Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428DP 9157

SP 9157DP 6428

SP 6428DP 5775

SP 5775DP 6428

SP provides ldquoreturn addressrdquo

10

Connection-oriented demuxbull TCP socket identified by 4-

tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number

bull recv host uses all four values to direct segment to appropriate socket

bull Server host may support many simultaneous TCP socketsndash each socket identified by its

own 4-tuplebull Web servers have different

sockets for each connecting clientndash non-persistent HTTP will have

different socket for each request

11

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157DP 80

SP 9157DP 80

P5 P6 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

12

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157DP 80

SP 9157DP 80

P4 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 10: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

10

Connection-oriented demuxbull TCP socket identified by 4-

tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number

bull recv host uses all four values to direct segment to appropriate socket

bull Server host may support many simultaneous TCP socketsndash each socket identified by its

own 4-tuplebull Web servers have different

sockets for each connecting clientndash non-persistent HTTP will have

different socket for each request

11

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157DP 80

SP 9157DP 80

P5 P6 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

12

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157DP 80

SP 9157DP 80

P4 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 11: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

11

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157DP 80

SP 9157DP 80

P5 P6 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

12

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157DP 80

SP 9157DP 80

P4 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 12: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

12

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157DP 80

SP 9157DP 80

P4 P3

D-IPCS-IP AD-IPC

S-IP B

SP 5775DP 80

D-IPCS-IP B

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 13: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

13

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to

appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection establishment

(which can add delay)bull simple no connection state at

sender receiverbull small segment headerbull no congestion control UDP can

blast away as fast as desired

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 14: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

14

UDP more

bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 15: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

15

UDP checksum

Senderbull treat segment contents as

sequence of 16-bit integersbull checksum addition (1rsquos

complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of received

segmentbull check if computed checksum

equals checksum field valuendash NO - error detectedndash YES - no error detected But

maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 16: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

16

Internet Checksum(time travel warning ndash we covered this earlier)

bull Notendash When adding numbers a carryout from the

most significant bit needs to be added to the result

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 17: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

17

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 18: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

18

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 19: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

19

Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

rdt_rcv()

udt_rcv()

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 20: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

20

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over

unreliable channel to receiver

rdt_rcv() called by rdt to deliver data to upper

rdt_rcv()

udt_rcv()

udt_rcv() called when packet arrives on rcv-side of channel

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 21: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

21

Reliable data transfer getting started

Wersquollbull incrementally develop sender receiver sides of

reliable data transfer protocol (rdt)bull consider only unidirectional data transfer

ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender

receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely

determined by next event

eventactions

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 22: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

22

KR state machines ndash a note

BewareKurose and Ross has a confusingconfused attitude to

state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the

KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my

interpretation is more specificrelevant

Statename

Statename

Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo

next state uniquely determined by next

event eventactions

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 23: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

23

Rdt10 reliable transfer over a reliable channel

bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

IDLE udt_send(packet)rdt_send(data)

rdt_rcv(data)IDLEudt_rcv(packet)

sender receiver

Event

Action

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 24: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

24

Rdt20 channel with bit errors

bull underlying channel may flip bits in packetndash checksum to detect bit errors

bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that

packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender

that packet had errorsndash sender retransmits packet on receipt of NAK

bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 25: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

25

rdt20 FSM specification

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

Waitingfor reply

IDLE

sender

receiverrdt_send(data)

L

Note the sender holds a copy of the packet being sent until the delivery is acknowledged

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 26: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

26

rdt20 operation with no errors

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 27: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

27

rdt20 error scenario

L

IDLE Waitingfor reply

IDLE

udt_send(packet)

rdt_rcv(data)udt_send(ACK)

udt_rcv(packet) ampamp notcorrupt(packet)

udt_rcv(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp isNAK(reply)

udt_send(NAK)

udt_rcv(packet) ampamp corrupt(packet)

rdt_send(data)

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 28: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

28

rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate

Handling duplicates bull sender retransmits current

packet if ACKNAK garbledbull sender adds sequence number

to each packetbull receiver discards (doesnrsquot

deliver) duplicate packet

Sender sends one packet then waits for receiver response

stop and wait

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 29: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

29

rdt21 sender handles garbled ACKNAKs

IDLE

sequence=0udt_send(packet)

rdt_send(data)

WaitingFor reply udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

udt_send(packet)

udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)

IDLEWaitingfor reply

LL

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 30: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

udt_rcv(packet) ampamp corrupt(packet)

30

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

udt_send(NAK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

udt_send(ACK)rdt_rcv(data)

Wait for 1 from below

udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)

udt_send(ACK)rdt_rcv(data)

udt_send(ACK)

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)

receive(packet) ampamp corrupt(packet)

udt_send(ACK)

udt_send(NAK)

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 31: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

31

rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at

sender

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 32: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

32

rdt22 a NAK-free protocol

bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK

ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK

retransmit current pkt

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 33: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

33

rdt22 sender receiver fragments

Wait for call 0 from above

sequence=0udt_send(packet)

rdt_send(data)

udt_send(packet)

rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )

udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)

udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))

udt_send(ACK1)receiver FSM

fragment

L

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 34: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

34

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not

enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull retransmits if no ACK received in this time

bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be

duplicate but use of seq rsquos already handles this

ndash receiver must specify seq of pkt being ACKed

bull requires countdown timer

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 35: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )

35

rdt30 sender

sequence=0udt_send(packet)

rdt_send(data)

Wait for

ACK0

IDLEstate 1

sequence=1udt_send(packet)

rdt_send(data)

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)

udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )

udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)

LL

udt_send(packet)timeout

udt_send(packet)timeout

udt_rcv(reply)

IDLEstate 0

Wait for

ACK1

Ludt_rcv(reply)

LL

L

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 36: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

36

rdt30 in action

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 37: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

37

rdt30 in action

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 38: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

38

Performance of rdt30

bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources

dsmicrosecon8bps10bits8000

9 RLdtrans

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 39: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

39

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 40: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

40

Pipelined (Packet-Window) protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver

bull Two generic forms of pipelined protocols go-Back-N selective repeat

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 41: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

41

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 42: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

42

Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks

ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet

ndash If timer expires retransmit all unacked packets

Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet

ndash When timer expires retransmit only unack packet

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 43: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

43

Selective repeat big picture

bull Sender can have up to N unacked packets in pipeline

bull Rcvr acks individual packetsbull Sender maintains timer for each unacked

packetndash When timer expires retransmit only unack packet

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 44: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

44

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 45: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

45

GBN sender extended FSM

Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block

base = getacknum(reply)+1

udt_rcv(reply) ampamp notcorrupt(reply)

base=1nextseqnum=1

udt_rcv(reply) ampamp corrupt(reply)

L

L

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 46: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

46

GBN receiver extended FSM

ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq

Wait

udt_send(reply)L

udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)

rdt_rcv(data)udt_send(ACK)expectedseqnum++

expectedseqnum=1

L

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 47: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

47

GBN inaction

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 48: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

48

Selective Repeat

bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper

layerbull sender only resends pkts for which ACK not received

ndash sender timer for each unACKed pktbull sender window

ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 49: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

49

Selective repeat sender receiver windows

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 50: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

50

Selective repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1] ACK(n)

otherwise ignore

receiver

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 51: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

51

Selective repeat in action

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 52: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

52

Selective repeat dilemma

Example bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

window size le (frac12 of seq size)

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 53: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

53

Automatic Repeat Request (ARQ)

+ Self-clocking (Automatic)

+ Adaptive

+ Flexible

- Slow to start adaptconsider high BandwidthDelay product

Now lets move fromthe generic to thespecifichellip

TCP arguably the most successful protocol in the Internethellip

its an ARQ protocol

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 54: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

54

TCP Overview RFCs 793 1122 1323 2018 2581 hellip

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T C Psend buffer

TC Preceive buffer

socketdoor

segm en t

app licationwrites data

applicationreads data

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 55: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

55

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 56: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

56

TCP seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles out-

of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoes

back lsquoCrsquo

timesimple telnet scenario

This has led to a world of hurthellip

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 57: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

57

TCP out of order attackbull ARQ with SACK means

recipient needs copies of all packets

bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte

bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers

bull Critical buffershellip

bull Send a legitimate request

GET indexhtml

this gets through an intrusion-detection system

then send a new segment replacing bytes 4-13 withldquopassword-filerdquo

A dumb exampleNeither of these attacks would work on a modern system

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 58: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

58

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissionsbull too long slow reaction to

segment loss

Q how to estimate RTTbull SampleRTT measured time from

segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 59: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

59

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 60: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

60

Some RTT estimates are never good

Associating the ACK with (a) original transmission versus (b) retransmission

KarnPartridge Algorithm ndash Ignore retransmission in measurements

(and increase timeout this makes retransmissions decreasingly aggressive)

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 61: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

61

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 62: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

62

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 63: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

63

TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 64: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

64

TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer Ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to be

ackedndash start timer if there are

outstanding segments

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 65: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

65

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 66: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

66

TCP retransmission scenariosHost A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92 ti

meo

utSendBase

= 100

SendBase= 120

SendBase= 120

Sendbase= 100

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 67: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

67

TCP retransmission scenarios (more)Host A

Seq=92 8 bytes data

ACK=100

loss

timeo

utHost B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Implicit ACK(eg not Go-Back-N)

ACK=120 implicitly ACKrsquos 100 too

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 68: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

68

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 69: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

69

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 70: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

70

Host A

timeo

ut

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACK

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 71: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

71

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 72: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

72

Silly Window SyndromeMSS advertises the amount a receiver can accept

If a transmitter has something to send ndash it will

This means small MSS values may persist- indefinitely

Solution

Wait to fill each segment but donrsquot wait indefinitely

NAGLErsquos Algorithm

If we wait too long interactive traffic is difficult

If we donrsquot want we get silly window syndrome

Solution Use a timer when the timer expires ndash send the (unfilled) segment

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 73: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

73

Flow Control ne Congestion Control

bull Flow control involves preventing senders from overrunning the capacity of the receivers

bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 74: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

74

Flow Control ndash (bad old days)

In-Line flow control

bull XONXOFF (^s^q)

bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)

Dedicated wires

bull RTSCTS handshaking

bull Read (or Write) Readysignals from memory interface saying slow-downstophellip

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 75: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

75

TCP Flow Controlbull receive side of TCP

connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

app process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer by

transmitting too much too fast

flow control

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 76: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

76

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer

doesnrsquot overflow

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 77: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

77

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator Socket clientSocket = new

Socket(hostnameport number)

bull server contacted by client Socket connectionSocket =

welcomeSocketaccept()

Three way handshakeStep 1 client host sends TCP SYN

segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 78: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

78

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 79: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

79

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 80: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

80

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 81: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

81

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too much data too

fast for network to handlerdquobull different from flow controlbull manifestations

ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)

bull a top-10 problem

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 82: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

82

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 83: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

83

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 84: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

84

Causescosts of congestion scenario 2 bull always (goodput)

bull ldquoperfectrdquo retransmission only when loss

bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same

lin

lout=

lin

loutgt

linlout

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt

R2

R2lin

l out

b

R2

R2lin

l out

a

R2

R2lin

l out

c

R4

R3

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 85: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

85

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 86: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

86

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Host A

Host B

lo

u

t

Congestion Collapse example Cocktail party effect

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 87: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

87

Approaches towards congestion control

End-end congestion controlbull no explicit feedback from

networkbull congestion inferred from end-

system observed loss delaybull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad approaches towards congestion control

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 88: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

88

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for

each received ACK until loss detected (W W + 1W)

multiplicative decrease cut CongWin in half after loss

(W W2)

time

cong

estio

n w

indo

w si

zeSaw toothbehavior probing

for bandwidth

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 89: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

89SLOW START IS NOT SHOWN

>

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 90: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

90

TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly

bull CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

three mechanismsndash AIMDndash slow startndash conservative after timeout

events

rate = CongWin

RTT Bytessec

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 91: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

91

AIMD Starts Too Slowly

t

Window

It could take a long time to get started

Need to start with a small CWND to avoid overloading the network

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 92: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

92

TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp

RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up

to respectable rate

When connection begins increase rate exponentially fast until first loss event

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 93: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

93

TCP Slow Start (more)bull When connection begins increase rate exponentially until

first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 94: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

94

Slow Start and the TCP Sawtooth

Loss

Exponentialldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 95: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

95

Refinement inferring lossbull After 3 dup ACKs

ndash CongWin is cut in halfndash window then grows linearly

bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 96: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

96

RefinementQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set

to 12 of CongWin just before loss event

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 97: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

97

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 98: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

98

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 99: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

99

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of previous CWND Ie

SSTHRESH

TimeoutFast Retransmission

SSThreshSet to Here

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 100: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

100

TCP throughput

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start

bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2

throughput to W2RTT bull Average throughout 75 WRTT

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 101: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

101

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p

bull L = 210-10 Ouch bull New versions of TCP for high-speed

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 102: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

Calculation on Simple Model(cwnd in units of MSS)

bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit

bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat

bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2

bull Throughput = (MSSRTT) sqrt(32p)

102

HINT KNOW THIS SLIDE

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 103: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

Three Congestion Control Challenges ndash or Why AIMD

bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem

bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate

bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows

103

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 104: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

104

Problem 1 Single Flow Fixed BW

bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows

bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)

bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 105: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

105

Problems with Slow-Startbull Slow-start can result in many losses

ndash Roughly the size of cwnd ~ BWRTT

bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped

bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 106: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

Problem 2 Single Flow Varying BW

Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you

wonrsquot know if more bandwidth is available

Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease

cwnd cwnd a bull Additive increase or decrease

cwnd cwnd +- b

106

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 107: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

Four alternatives

bull AIAD gentle increase gentle decrease

bull AIMD gentle increase drastic decrease

bull MIAD drastic increase gentle decreasendash too many losses eliminate

bull MIMD drastic increase and decrease

107

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 108: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

108

Problem 3 Multiple Flows

bull Want steady state to be ldquofairrdquo

bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth

bull This eliminates MIMD and AIADndash As we shall seehellip

bull AIMD is the only remaining solutionndash Not really but close enoughhellip

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 109: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

109

Recall Buffer and Window Dynamics

bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2

A BC = 50 pktsRTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if gt 20

Rate (pktsRTT)

x

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 110: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

110

AIMD Sharing DynamicsA Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 111: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

111

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 112: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

112

Simple Model of Congestion Control

bull Two TCP connectionsndash Rates x1 and x2

bull Congestion when sumgt1

bull Efficiency sum near 1

bull Fairness xrsquos converge

Bandwidth for User 1 x1B

andw

idth

for U

ser 2

x2

Efficiencyline

2 user example

overload

underload

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 113: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

113

Example

User 1 x1

Use

r 2 x

2

fairnessline

efficiencyline

1

1

bull Total bandwidth 1

Inefficient x1+x2=07

(02 05)

Congested x1+x2=12

(07 05)

Efficient x1+x2=1Not fair

(07 03)

Efficient x1+x2=1Fair

(05 05)

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 114: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

114

AIAD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(x1h-aDx2h-aD)

(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI

bull Decrease x - aD

bull Does not converge to fairness

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 115: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

115

MIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bdx1hbdx2h)

(bIbDx1hbIbDx2h)

bull Increase xbI

bull Decrease xbD

bull Does not converge to fairness

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 116: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

116

(bDx1h+aIbDx2h+aI)

AIMD

Bandwidth for User 1 x1

Ban

dwid

th fo

r Use

r 2 x

2

fairnessline

efficiencyline

(x1hx2h)

(bDx1hbDx2h)

bull Increase x+aD

bull Decrease xbD

bull Converges to fairness

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 117: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

117

Why is AIMD fair(a pretty animationhellip)

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Bandwidth for Connection 1

Ban

dwid

th fo

r Con

necti

on 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 118: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

118

Fairness (more)Fairness and UDPbull Multimedia apps may not

use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull (Ancient yet ongoing) Research area TCP friendly

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP gets rate

R10ndash new app asks for 11 TCPs gets

R2

bull Recall Multiple browser sessions (and the potential for syncronized loss)

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip
Page 119: Computer Networking Lent Term M/W/F  11-midday LT1 in Gates  Building Slide Set 5 Andrew  W. Moore

119

Synchronized Flows Many TCP Flowsbull Aggregate window has

same dynamicsbull Therefore buffer occupancy

has same dynamicsbull Rule-of-thumb still holds

bull Independent desynchronized

bull Central limit theorem says the aggregate becomes Gaussian

bull Variance (buffer size) decreases as N increases

Some TCP issues outstandinghellip

ProbabilityDistribution

t

Buffer Size

t

  • Slide 1
  • Topic 5 ndash Transport
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Multiplexingdemultiplexing (Transport-layer style)
  • How transport-layer demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum (time travel warning ndash we covered this earlie
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable data transfer getting started
  • Reliable data transfer getting started (2)
  • KR state machines ndash a note
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • rdt20 operation with no errors
  • rdt20 error scenario
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • rdt30 channels with errors and loss
  • rdt30 sender
  • rdt30 in action
  • rdt30 in action (2)
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined (Packet-Window) protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in action
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Selective repeat dilemma
  • Automatic Repeat Request (ARQ)
  • TCP Overview RFCs 793 1122 1323 2018 2581 hellip
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP out of order attack
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Some RTT estimates are never good
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 70
  • Fast retransmit algorithm
  • Silly Window Syndrome
  • Flow Control ne Congestion Control
  • Flow Control ndash (bad old days)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • TCP congestion control additive increase multiplicative decre
  • Slide 89
  • TCP Congestion Control details
  • AIMD Starts Too Slowly
  • TCP Slow Start
  • TCP Slow Start (more)
  • Slow Start and the TCP Sawtooth
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • Repeating Slow Start After Timeout
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • Calculation on Simple Model (cwnd in units of MSS)
  • Three Congestion Control Challenges ndash or Why AIMD
  • Problem 1 Single Flow Fixed BW
  • Problems with Slow-Start
  • Problem 2 Single Flow Varying BW
  • Four alternatives
  • Problem 3 Multiple Flows
  • Recall Buffer and Window Dynamics
  • AIMD Sharing Dynamics
  • AIAD Sharing Dynamics
  • Simple Model of Congestion Control
  • Example
  • AIAD
  • MIMD
  • AIMD
  • Why is AIMD fair (a pretty animationhellip)
  • Fairness (more)
  • Some TCP issues outstandinghellip