tcp congestion control - cs.colostate.edu

47
TCP Congestion Control CS 457 Fall 2014

Upload: others

Post on 03-Apr-2022

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TCP Congestion Control - cs.colostate.edu

TCP Congestion Control

CS 457 Fall 2014

Topics bull  Principles of congestion control

ndash  How to detect congestion ndash  How to adapt and alleviate congestion

bull  TCP congestion control ndash  Additive-increase multiplicative-decrease ndash  Slow start and slow-start restart

bull  Related TCP mechanisms ndash  Naglersquos algorithm and delayed acknowledgments

bull  TCP Throughput and Fairness bull  Active Queue Management (AQM)

ndash  Random Early Detection (RED) ndash  Explicit Congestion Notification (ECN)

Resource Allocation vs Congestion Control

bull  Resource allocation (connection-oriented networks) ndash  How routers meet competing demands for resources ndash  Reservations allocate link bandwidth and buffer space to

a flow ndash  Admission control when to say no and to whom

bull  Congestion control (Internet) ndash  How nodes prevent or respond to overload conditions ndash  Eg persuade hosts to stop sending or slow down ndash  Typically much less exact ndash  Have some notion of fairness (ie sharing the pain)

Flow Control vs Congestion Control

bull  Flow control ndash  Keeping one fast sender from overwhelming a slow

receiver bull  Congestion control

ndash  Keep a set of senders from overloading the network bull  Different concepts but similar mechanisms

ndash  TCP flow control receiver window ndash  TCP congestion control congestion window ndash  TCP actual window mincongestion window receiver

window

Congestion in the Internet is Unavoidable

bull  Two packets arrive at the same time ndash  The router can only transmit one ndash  hellip and either buffer or drop the other

bull  If many packets arrive in a short period of time ndash  The router cannot keep up with the arriving traffic ndash  hellip and the buffer may eventually overflow

Metrics Throughput vs Delay bull  High throughput

ndash  Throughput measured performance of a system ndash  Eg number of bitssecond of data that get through

bull  Low delay ndash  Delay time required to deliver a packet or message ndash  Eg number of ms to deliver a packet

bull  These two metrics are sometimes at odds ndash  Eg suppose you drive a link as hard as possible ndash  hellip then throughput will be high but delay will be too

Load Delay and Power

Average Packet delay

Load

Typical behavior of queuing systems with random arrivals

Power

Load

A simple metric of how well the network is performing

LoadPowerDelay

=

ldquooptimal loadrdquo

Goal maximize power

Fairness bull  Effective utilization is not the only goal

ndash  We also want to be fair to the various flows ndash  hellip but what does that mean

bull  Simple definition equal shares of the bandwidth ndash  N flows that each get 1N of the bandwidth ndash  But what if the flows traverse different paths ndash  Still a hard and open problem in the Internet

Simple Queuing Mechanism bull  Simplest approach FIFO queue and drop-tail bull  Link bandwidth allocation first-in first-out queue

ndash  Packets transmitted in the order they arrive

bull  Buffer space allocation drop-tail queuing ndash  If the queue is full drop the incoming packet

Simple Congestion Detection bull  Packet loss

ndash  Packet gets dropped along the way bull  Packet delay

ndash  Packet experiences high delay bull  How does TCP sender learn these

ndash  Loss bull  Timeout bull  Triple-duplicate acknowledgment

ndash  Delay bull  Round-trip time estimate

TCP Congestion Control Basics bull  Each source determines available capacity

ndash  hellip and how many packets is allowed to have in transit bull  Congestion window

ndash  Maximum of unackrsquoed bytes allowed to be in transit (the congestion-control equivalent of receiver window)

ndash  MaxWindow = mincongestion window receiver window - send at the rate of the slowest component

bull  How to adapt the congestion window ndash  Decrease upon losing a packet back-off ndash  Increase upon success explore new capacity

Additive Increase Multiplicative Decrease

bull  How much to increase and decrease ndash  Increase linearly decrease multiplicatively ndash  A necessary condition for stability of TCP ndash  Consequences of oversized window are much worse

than having an under-sized window bull  Oversized window packets dropped retransmitted pain for all bull  Undersized window lower throughput for one flow

bull  Multiplicative decrease ndash  On loss of packet divide congestion window in half

bull  Additive increase ndash  On success for last window of data increase linearly

adding one MSS per RTT

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 2: TCP Congestion Control - cs.colostate.edu

Topics bull  Principles of congestion control

ndash  How to detect congestion ndash  How to adapt and alleviate congestion

bull  TCP congestion control ndash  Additive-increase multiplicative-decrease ndash  Slow start and slow-start restart

bull  Related TCP mechanisms ndash  Naglersquos algorithm and delayed acknowledgments

bull  TCP Throughput and Fairness bull  Active Queue Management (AQM)

ndash  Random Early Detection (RED) ndash  Explicit Congestion Notification (ECN)

Resource Allocation vs Congestion Control

bull  Resource allocation (connection-oriented networks) ndash  How routers meet competing demands for resources ndash  Reservations allocate link bandwidth and buffer space to

a flow ndash  Admission control when to say no and to whom

bull  Congestion control (Internet) ndash  How nodes prevent or respond to overload conditions ndash  Eg persuade hosts to stop sending or slow down ndash  Typically much less exact ndash  Have some notion of fairness (ie sharing the pain)

Flow Control vs Congestion Control

bull  Flow control ndash  Keeping one fast sender from overwhelming a slow

receiver bull  Congestion control

ndash  Keep a set of senders from overloading the network bull  Different concepts but similar mechanisms

ndash  TCP flow control receiver window ndash  TCP congestion control congestion window ndash  TCP actual window mincongestion window receiver

window

Congestion in the Internet is Unavoidable

bull  Two packets arrive at the same time ndash  The router can only transmit one ndash  hellip and either buffer or drop the other

bull  If many packets arrive in a short period of time ndash  The router cannot keep up with the arriving traffic ndash  hellip and the buffer may eventually overflow

Metrics Throughput vs Delay bull  High throughput

ndash  Throughput measured performance of a system ndash  Eg number of bitssecond of data that get through

bull  Low delay ndash  Delay time required to deliver a packet or message ndash  Eg number of ms to deliver a packet

bull  These two metrics are sometimes at odds ndash  Eg suppose you drive a link as hard as possible ndash  hellip then throughput will be high but delay will be too

Load Delay and Power

Average Packet delay

Load

Typical behavior of queuing systems with random arrivals

Power

Load

A simple metric of how well the network is performing

LoadPowerDelay

=

ldquooptimal loadrdquo

Goal maximize power

Fairness bull  Effective utilization is not the only goal

ndash  We also want to be fair to the various flows ndash  hellip but what does that mean

bull  Simple definition equal shares of the bandwidth ndash  N flows that each get 1N of the bandwidth ndash  But what if the flows traverse different paths ndash  Still a hard and open problem in the Internet

Simple Queuing Mechanism bull  Simplest approach FIFO queue and drop-tail bull  Link bandwidth allocation first-in first-out queue

ndash  Packets transmitted in the order they arrive

bull  Buffer space allocation drop-tail queuing ndash  If the queue is full drop the incoming packet

Simple Congestion Detection bull  Packet loss

ndash  Packet gets dropped along the way bull  Packet delay

ndash  Packet experiences high delay bull  How does TCP sender learn these

ndash  Loss bull  Timeout bull  Triple-duplicate acknowledgment

ndash  Delay bull  Round-trip time estimate

TCP Congestion Control Basics bull  Each source determines available capacity

ndash  hellip and how many packets is allowed to have in transit bull  Congestion window

ndash  Maximum of unackrsquoed bytes allowed to be in transit (the congestion-control equivalent of receiver window)

ndash  MaxWindow = mincongestion window receiver window - send at the rate of the slowest component

bull  How to adapt the congestion window ndash  Decrease upon losing a packet back-off ndash  Increase upon success explore new capacity

Additive Increase Multiplicative Decrease

bull  How much to increase and decrease ndash  Increase linearly decrease multiplicatively ndash  A necessary condition for stability of TCP ndash  Consequences of oversized window are much worse

than having an under-sized window bull  Oversized window packets dropped retransmitted pain for all bull  Undersized window lower throughput for one flow

bull  Multiplicative decrease ndash  On loss of packet divide congestion window in half

bull  Additive increase ndash  On success for last window of data increase linearly

adding one MSS per RTT

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 3: TCP Congestion Control - cs.colostate.edu

Resource Allocation vs Congestion Control

bull  Resource allocation (connection-oriented networks) ndash  How routers meet competing demands for resources ndash  Reservations allocate link bandwidth and buffer space to

a flow ndash  Admission control when to say no and to whom

bull  Congestion control (Internet) ndash  How nodes prevent or respond to overload conditions ndash  Eg persuade hosts to stop sending or slow down ndash  Typically much less exact ndash  Have some notion of fairness (ie sharing the pain)

Flow Control vs Congestion Control

bull  Flow control ndash  Keeping one fast sender from overwhelming a slow

receiver bull  Congestion control

ndash  Keep a set of senders from overloading the network bull  Different concepts but similar mechanisms

ndash  TCP flow control receiver window ndash  TCP congestion control congestion window ndash  TCP actual window mincongestion window receiver

window

Congestion in the Internet is Unavoidable

bull  Two packets arrive at the same time ndash  The router can only transmit one ndash  hellip and either buffer or drop the other

bull  If many packets arrive in a short period of time ndash  The router cannot keep up with the arriving traffic ndash  hellip and the buffer may eventually overflow

Metrics Throughput vs Delay bull  High throughput

ndash  Throughput measured performance of a system ndash  Eg number of bitssecond of data that get through

bull  Low delay ndash  Delay time required to deliver a packet or message ndash  Eg number of ms to deliver a packet

bull  These two metrics are sometimes at odds ndash  Eg suppose you drive a link as hard as possible ndash  hellip then throughput will be high but delay will be too

Load Delay and Power

Average Packet delay

Load

Typical behavior of queuing systems with random arrivals

Power

Load

A simple metric of how well the network is performing

LoadPowerDelay

=

ldquooptimal loadrdquo

Goal maximize power

Fairness bull  Effective utilization is not the only goal

ndash  We also want to be fair to the various flows ndash  hellip but what does that mean

bull  Simple definition equal shares of the bandwidth ndash  N flows that each get 1N of the bandwidth ndash  But what if the flows traverse different paths ndash  Still a hard and open problem in the Internet

Simple Queuing Mechanism bull  Simplest approach FIFO queue and drop-tail bull  Link bandwidth allocation first-in first-out queue

ndash  Packets transmitted in the order they arrive

bull  Buffer space allocation drop-tail queuing ndash  If the queue is full drop the incoming packet

Simple Congestion Detection bull  Packet loss

ndash  Packet gets dropped along the way bull  Packet delay

ndash  Packet experiences high delay bull  How does TCP sender learn these

ndash  Loss bull  Timeout bull  Triple-duplicate acknowledgment

ndash  Delay bull  Round-trip time estimate

TCP Congestion Control Basics bull  Each source determines available capacity

ndash  hellip and how many packets is allowed to have in transit bull  Congestion window

ndash  Maximum of unackrsquoed bytes allowed to be in transit (the congestion-control equivalent of receiver window)

ndash  MaxWindow = mincongestion window receiver window - send at the rate of the slowest component

bull  How to adapt the congestion window ndash  Decrease upon losing a packet back-off ndash  Increase upon success explore new capacity

Additive Increase Multiplicative Decrease

bull  How much to increase and decrease ndash  Increase linearly decrease multiplicatively ndash  A necessary condition for stability of TCP ndash  Consequences of oversized window are much worse

than having an under-sized window bull  Oversized window packets dropped retransmitted pain for all bull  Undersized window lower throughput for one flow

bull  Multiplicative decrease ndash  On loss of packet divide congestion window in half

bull  Additive increase ndash  On success for last window of data increase linearly

adding one MSS per RTT

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 4: TCP Congestion Control - cs.colostate.edu

Flow Control vs Congestion Control

bull  Flow control ndash  Keeping one fast sender from overwhelming a slow

receiver bull  Congestion control

ndash  Keep a set of senders from overloading the network bull  Different concepts but similar mechanisms

ndash  TCP flow control receiver window ndash  TCP congestion control congestion window ndash  TCP actual window mincongestion window receiver

window

Congestion in the Internet is Unavoidable

bull  Two packets arrive at the same time ndash  The router can only transmit one ndash  hellip and either buffer or drop the other

bull  If many packets arrive in a short period of time ndash  The router cannot keep up with the arriving traffic ndash  hellip and the buffer may eventually overflow

Metrics Throughput vs Delay bull  High throughput

ndash  Throughput measured performance of a system ndash  Eg number of bitssecond of data that get through

bull  Low delay ndash  Delay time required to deliver a packet or message ndash  Eg number of ms to deliver a packet

bull  These two metrics are sometimes at odds ndash  Eg suppose you drive a link as hard as possible ndash  hellip then throughput will be high but delay will be too

Load Delay and Power

Average Packet delay

Load

Typical behavior of queuing systems with random arrivals

Power

Load

A simple metric of how well the network is performing

LoadPowerDelay

=

ldquooptimal loadrdquo

Goal maximize power

Fairness bull  Effective utilization is not the only goal

ndash  We also want to be fair to the various flows ndash  hellip but what does that mean

bull  Simple definition equal shares of the bandwidth ndash  N flows that each get 1N of the bandwidth ndash  But what if the flows traverse different paths ndash  Still a hard and open problem in the Internet

Simple Queuing Mechanism bull  Simplest approach FIFO queue and drop-tail bull  Link bandwidth allocation first-in first-out queue

ndash  Packets transmitted in the order they arrive

bull  Buffer space allocation drop-tail queuing ndash  If the queue is full drop the incoming packet

Simple Congestion Detection bull  Packet loss

ndash  Packet gets dropped along the way bull  Packet delay

ndash  Packet experiences high delay bull  How does TCP sender learn these

ndash  Loss bull  Timeout bull  Triple-duplicate acknowledgment

ndash  Delay bull  Round-trip time estimate

TCP Congestion Control Basics bull  Each source determines available capacity

ndash  hellip and how many packets is allowed to have in transit bull  Congestion window

ndash  Maximum of unackrsquoed bytes allowed to be in transit (the congestion-control equivalent of receiver window)

ndash  MaxWindow = mincongestion window receiver window - send at the rate of the slowest component

bull  How to adapt the congestion window ndash  Decrease upon losing a packet back-off ndash  Increase upon success explore new capacity

Additive Increase Multiplicative Decrease

bull  How much to increase and decrease ndash  Increase linearly decrease multiplicatively ndash  A necessary condition for stability of TCP ndash  Consequences of oversized window are much worse

than having an under-sized window bull  Oversized window packets dropped retransmitted pain for all bull  Undersized window lower throughput for one flow

bull  Multiplicative decrease ndash  On loss of packet divide congestion window in half

bull  Additive increase ndash  On success for last window of data increase linearly

adding one MSS per RTT

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 5: TCP Congestion Control - cs.colostate.edu

Congestion in the Internet is Unavoidable

bull  Two packets arrive at the same time ndash  The router can only transmit one ndash  hellip and either buffer or drop the other

bull  If many packets arrive in a short period of time ndash  The router cannot keep up with the arriving traffic ndash  hellip and the buffer may eventually overflow

Metrics Throughput vs Delay bull  High throughput

ndash  Throughput measured performance of a system ndash  Eg number of bitssecond of data that get through

bull  Low delay ndash  Delay time required to deliver a packet or message ndash  Eg number of ms to deliver a packet

bull  These two metrics are sometimes at odds ndash  Eg suppose you drive a link as hard as possible ndash  hellip then throughput will be high but delay will be too

Load Delay and Power

Average Packet delay

Load

Typical behavior of queuing systems with random arrivals

Power

Load

A simple metric of how well the network is performing

LoadPowerDelay

=

ldquooptimal loadrdquo

Goal maximize power

Fairness bull  Effective utilization is not the only goal

ndash  We also want to be fair to the various flows ndash  hellip but what does that mean

bull  Simple definition equal shares of the bandwidth ndash  N flows that each get 1N of the bandwidth ndash  But what if the flows traverse different paths ndash  Still a hard and open problem in the Internet

Simple Queuing Mechanism bull  Simplest approach FIFO queue and drop-tail bull  Link bandwidth allocation first-in first-out queue

ndash  Packets transmitted in the order they arrive

bull  Buffer space allocation drop-tail queuing ndash  If the queue is full drop the incoming packet

Simple Congestion Detection bull  Packet loss

ndash  Packet gets dropped along the way bull  Packet delay

ndash  Packet experiences high delay bull  How does TCP sender learn these

ndash  Loss bull  Timeout bull  Triple-duplicate acknowledgment

ndash  Delay bull  Round-trip time estimate

TCP Congestion Control Basics bull  Each source determines available capacity

ndash  hellip and how many packets is allowed to have in transit bull  Congestion window

ndash  Maximum of unackrsquoed bytes allowed to be in transit (the congestion-control equivalent of receiver window)

ndash  MaxWindow = mincongestion window receiver window - send at the rate of the slowest component

bull  How to adapt the congestion window ndash  Decrease upon losing a packet back-off ndash  Increase upon success explore new capacity

Additive Increase Multiplicative Decrease

bull  How much to increase and decrease ndash  Increase linearly decrease multiplicatively ndash  A necessary condition for stability of TCP ndash  Consequences of oversized window are much worse

than having an under-sized window bull  Oversized window packets dropped retransmitted pain for all bull  Undersized window lower throughput for one flow

bull  Multiplicative decrease ndash  On loss of packet divide congestion window in half

bull  Additive increase ndash  On success for last window of data increase linearly

adding one MSS per RTT

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 6: TCP Congestion Control - cs.colostate.edu

Metrics Throughput vs Delay bull  High throughput

ndash  Throughput measured performance of a system ndash  Eg number of bitssecond of data that get through

bull  Low delay ndash  Delay time required to deliver a packet or message ndash  Eg number of ms to deliver a packet

bull  These two metrics are sometimes at odds ndash  Eg suppose you drive a link as hard as possible ndash  hellip then throughput will be high but delay will be too

Load Delay and Power

Average Packet delay

Load

Typical behavior of queuing systems with random arrivals

Power

Load

A simple metric of how well the network is performing

LoadPowerDelay

=

ldquooptimal loadrdquo

Goal maximize power

Fairness bull  Effective utilization is not the only goal

ndash  We also want to be fair to the various flows ndash  hellip but what does that mean

bull  Simple definition equal shares of the bandwidth ndash  N flows that each get 1N of the bandwidth ndash  But what if the flows traverse different paths ndash  Still a hard and open problem in the Internet

Simple Queuing Mechanism bull  Simplest approach FIFO queue and drop-tail bull  Link bandwidth allocation first-in first-out queue

ndash  Packets transmitted in the order they arrive

bull  Buffer space allocation drop-tail queuing ndash  If the queue is full drop the incoming packet

Simple Congestion Detection bull  Packet loss

ndash  Packet gets dropped along the way bull  Packet delay

ndash  Packet experiences high delay bull  How does TCP sender learn these

ndash  Loss bull  Timeout bull  Triple-duplicate acknowledgment

ndash  Delay bull  Round-trip time estimate

TCP Congestion Control Basics bull  Each source determines available capacity

ndash  hellip and how many packets is allowed to have in transit bull  Congestion window

ndash  Maximum of unackrsquoed bytes allowed to be in transit (the congestion-control equivalent of receiver window)

ndash  MaxWindow = mincongestion window receiver window - send at the rate of the slowest component

bull  How to adapt the congestion window ndash  Decrease upon losing a packet back-off ndash  Increase upon success explore new capacity

Additive Increase Multiplicative Decrease

bull  How much to increase and decrease ndash  Increase linearly decrease multiplicatively ndash  A necessary condition for stability of TCP ndash  Consequences of oversized window are much worse

than having an under-sized window bull  Oversized window packets dropped retransmitted pain for all bull  Undersized window lower throughput for one flow

bull  Multiplicative decrease ndash  On loss of packet divide congestion window in half

bull  Additive increase ndash  On success for last window of data increase linearly

adding one MSS per RTT

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 7: TCP Congestion Control - cs.colostate.edu

Load Delay and Power

Average Packet delay

Load

Typical behavior of queuing systems with random arrivals

Power

Load

A simple metric of how well the network is performing

LoadPowerDelay

=

ldquooptimal loadrdquo

Goal maximize power

Fairness bull  Effective utilization is not the only goal

ndash  We also want to be fair to the various flows ndash  hellip but what does that mean

bull  Simple definition equal shares of the bandwidth ndash  N flows that each get 1N of the bandwidth ndash  But what if the flows traverse different paths ndash  Still a hard and open problem in the Internet

Simple Queuing Mechanism bull  Simplest approach FIFO queue and drop-tail bull  Link bandwidth allocation first-in first-out queue

ndash  Packets transmitted in the order they arrive

bull  Buffer space allocation drop-tail queuing ndash  If the queue is full drop the incoming packet

Simple Congestion Detection bull  Packet loss

ndash  Packet gets dropped along the way bull  Packet delay

ndash  Packet experiences high delay bull  How does TCP sender learn these

ndash  Loss bull  Timeout bull  Triple-duplicate acknowledgment

ndash  Delay bull  Round-trip time estimate

TCP Congestion Control Basics bull  Each source determines available capacity

ndash  hellip and how many packets is allowed to have in transit bull  Congestion window

ndash  Maximum of unackrsquoed bytes allowed to be in transit (the congestion-control equivalent of receiver window)

ndash  MaxWindow = mincongestion window receiver window - send at the rate of the slowest component

bull  How to adapt the congestion window ndash  Decrease upon losing a packet back-off ndash  Increase upon success explore new capacity

Additive Increase Multiplicative Decrease

bull  How much to increase and decrease ndash  Increase linearly decrease multiplicatively ndash  A necessary condition for stability of TCP ndash  Consequences of oversized window are much worse

than having an under-sized window bull  Oversized window packets dropped retransmitted pain for all bull  Undersized window lower throughput for one flow

bull  Multiplicative decrease ndash  On loss of packet divide congestion window in half

bull  Additive increase ndash  On success for last window of data increase linearly

adding one MSS per RTT

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 8: TCP Congestion Control - cs.colostate.edu

Fairness bull  Effective utilization is not the only goal

ndash  We also want to be fair to the various flows ndash  hellip but what does that mean

bull  Simple definition equal shares of the bandwidth ndash  N flows that each get 1N of the bandwidth ndash  But what if the flows traverse different paths ndash  Still a hard and open problem in the Internet

Simple Queuing Mechanism bull  Simplest approach FIFO queue and drop-tail bull  Link bandwidth allocation first-in first-out queue

ndash  Packets transmitted in the order they arrive

bull  Buffer space allocation drop-tail queuing ndash  If the queue is full drop the incoming packet

Simple Congestion Detection bull  Packet loss

ndash  Packet gets dropped along the way bull  Packet delay

ndash  Packet experiences high delay bull  How does TCP sender learn these

ndash  Loss bull  Timeout bull  Triple-duplicate acknowledgment

ndash  Delay bull  Round-trip time estimate

TCP Congestion Control Basics bull  Each source determines available capacity

ndash  hellip and how many packets is allowed to have in transit bull  Congestion window

ndash  Maximum of unackrsquoed bytes allowed to be in transit (the congestion-control equivalent of receiver window)

ndash  MaxWindow = mincongestion window receiver window - send at the rate of the slowest component

bull  How to adapt the congestion window ndash  Decrease upon losing a packet back-off ndash  Increase upon success explore new capacity

Additive Increase Multiplicative Decrease

bull  How much to increase and decrease ndash  Increase linearly decrease multiplicatively ndash  A necessary condition for stability of TCP ndash  Consequences of oversized window are much worse

than having an under-sized window bull  Oversized window packets dropped retransmitted pain for all bull  Undersized window lower throughput for one flow

bull  Multiplicative decrease ndash  On loss of packet divide congestion window in half

bull  Additive increase ndash  On success for last window of data increase linearly

adding one MSS per RTT

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 9: TCP Congestion Control - cs.colostate.edu

Simple Queuing Mechanism bull  Simplest approach FIFO queue and drop-tail bull  Link bandwidth allocation first-in first-out queue

ndash  Packets transmitted in the order they arrive

bull  Buffer space allocation drop-tail queuing ndash  If the queue is full drop the incoming packet

Simple Congestion Detection bull  Packet loss

ndash  Packet gets dropped along the way bull  Packet delay

ndash  Packet experiences high delay bull  How does TCP sender learn these

ndash  Loss bull  Timeout bull  Triple-duplicate acknowledgment

ndash  Delay bull  Round-trip time estimate

TCP Congestion Control Basics bull  Each source determines available capacity

ndash  hellip and how many packets is allowed to have in transit bull  Congestion window

ndash  Maximum of unackrsquoed bytes allowed to be in transit (the congestion-control equivalent of receiver window)

ndash  MaxWindow = mincongestion window receiver window - send at the rate of the slowest component

bull  How to adapt the congestion window ndash  Decrease upon losing a packet back-off ndash  Increase upon success explore new capacity

Additive Increase Multiplicative Decrease

bull  How much to increase and decrease ndash  Increase linearly decrease multiplicatively ndash  A necessary condition for stability of TCP ndash  Consequences of oversized window are much worse

than having an under-sized window bull  Oversized window packets dropped retransmitted pain for all bull  Undersized window lower throughput for one flow

bull  Multiplicative decrease ndash  On loss of packet divide congestion window in half

bull  Additive increase ndash  On success for last window of data increase linearly

adding one MSS per RTT

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 10: TCP Congestion Control - cs.colostate.edu

Simple Congestion Detection bull  Packet loss

ndash  Packet gets dropped along the way bull  Packet delay

ndash  Packet experiences high delay bull  How does TCP sender learn these

ndash  Loss bull  Timeout bull  Triple-duplicate acknowledgment

ndash  Delay bull  Round-trip time estimate

TCP Congestion Control Basics bull  Each source determines available capacity

ndash  hellip and how many packets is allowed to have in transit bull  Congestion window

ndash  Maximum of unackrsquoed bytes allowed to be in transit (the congestion-control equivalent of receiver window)

ndash  MaxWindow = mincongestion window receiver window - send at the rate of the slowest component

bull  How to adapt the congestion window ndash  Decrease upon losing a packet back-off ndash  Increase upon success explore new capacity

Additive Increase Multiplicative Decrease

bull  How much to increase and decrease ndash  Increase linearly decrease multiplicatively ndash  A necessary condition for stability of TCP ndash  Consequences of oversized window are much worse

than having an under-sized window bull  Oversized window packets dropped retransmitted pain for all bull  Undersized window lower throughput for one flow

bull  Multiplicative decrease ndash  On loss of packet divide congestion window in half

bull  Additive increase ndash  On success for last window of data increase linearly

adding one MSS per RTT

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 11: TCP Congestion Control - cs.colostate.edu

TCP Congestion Control Basics bull  Each source determines available capacity

ndash  hellip and how many packets is allowed to have in transit bull  Congestion window

ndash  Maximum of unackrsquoed bytes allowed to be in transit (the congestion-control equivalent of receiver window)

ndash  MaxWindow = mincongestion window receiver window - send at the rate of the slowest component

bull  How to adapt the congestion window ndash  Decrease upon losing a packet back-off ndash  Increase upon success explore new capacity

Additive Increase Multiplicative Decrease

bull  How much to increase and decrease ndash  Increase linearly decrease multiplicatively ndash  A necessary condition for stability of TCP ndash  Consequences of oversized window are much worse

than having an under-sized window bull  Oversized window packets dropped retransmitted pain for all bull  Undersized window lower throughput for one flow

bull  Multiplicative decrease ndash  On loss of packet divide congestion window in half

bull  Additive increase ndash  On success for last window of data increase linearly

adding one MSS per RTT

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 12: TCP Congestion Control - cs.colostate.edu

Additive Increase Multiplicative Decrease

bull  How much to increase and decrease ndash  Increase linearly decrease multiplicatively ndash  A necessary condition for stability of TCP ndash  Consequences of oversized window are much worse

than having an under-sized window bull  Oversized window packets dropped retransmitted pain for all bull  Undersized window lower throughput for one flow

bull  Multiplicative decrease ndash  On loss of packet divide congestion window in half

bull  Additive increase ndash  On success for last window of data increase linearly

adding one MSS per RTT

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 13: TCP Congestion Control - cs.colostate.edu

TCP ldquoSawtoothrdquo Behavior

t

Window

halved

Loss

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 14: TCP Congestion Control - cs.colostate.edu

Practical Details bull  Congestion window (cwnd)

ndash  Represented in bytes not in packets (Why) ndash  Packets typically one MSS (Maximum Segment Size)

bull  Increasing the congestion window ndash  Increase by MSS on success for last window of data ndash  In practice increase a fraction of MSS per received

ACK bull  packets per window CWND MSS bull  Increment per ACK MSS (MSS CWND)

bull  Decreasing the congestion window ndash  Cut in half but never below 1 MSS

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 15: TCP Congestion Control - cs.colostate.edu

Getting Started

t

Window

But could take a long time to get started

Need to start with a small CWND to avoid overloading the network

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 16: TCP Congestion Control - cs.colostate.edu

ldquoSlow Startrdquo Phase bull  Start with a small congestion window

ndash  Initially CWND is 1 MSS ndash  So initial sending rate is MSSRTT

bull  That could be pretty wasteful ndash  Might be much less than the available bandwidth ndash  Linear increase takes a long time to accelerate

bull  Slow-start phase (but in reality itrsquos ldquofast startrdquo) ndash  Sender starts at a slow rate (hence the name) ndash  hellip but increases the rate exponentially ndash  hellip until the first loss event

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 17: TCP Congestion Control - cs.colostate.edu

Slow Start in Action Double CWND per round-trip time

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 18: TCP Congestion Control - cs.colostate.edu

Slow Start and the TCP Sawtooth

Loss

Exponential ldquoslow startrdquo

t

Window

Why is it called slow-start Because TCP originally had no congestion control mechanism The source would just

start by sending a whole windowrsquos worth of data

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 19: TCP Congestion Control - cs.colostate.edu

Two Kinds of Loss in TCP bull  Triple duplicate ACK

ndash  Packet n is lost but packets n+1 n+2 etc arrive ndash  Receiver sends duplicate acknowledgments ndash  hellip and the sender retransmits packet n quickly ndash  Do a multiplicative decrease and keep going (no slow-

start) bull  Timeout

ndash  Packet n is lost and detected via a timeout ndash  Could be because all packets in flight were lost ndash  After the timeout blasting away for the entire CWND ndash  hellip would trigger a very large burst in traffic ndash  So better to start over with a very low CWND

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 20: TCP Congestion Control - cs.colostate.edu

Repeating Slow Start After Timeout

t

Window

Slow-start restart Go back to CWND of 1 but take advantage of knowing the previous value of CWND

Slow start in operation until it reaches half of

previous cwnd

timeout threshold

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 21: TCP Congestion Control - cs.colostate.edu

Repeating Slow Start After Idle Period

bull  Suppose a TCP connection goes idle for a while ndash  Eg Telnet session where you donrsquot type for an hour

bull  Eventually the network conditions change ndash  Maybe many more flows are traversing the link ndash  Eg maybe everybody has come back from lunch

bull  Dangerous to start transmitting at the old rate ndash  Previously-idle TCP sender might blast the network ndash  hellip causing excessive congestion and packet loss

bull  So some TCP implementations repeat slow start ndash  Slow-start restart after an idle period

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 22: TCP Congestion Control - cs.colostate.edu

Summary TCP Congestion Control bull  When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull  When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull  When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull  When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 23: TCP Congestion Control - cs.colostate.edu

Event State TCP Sender Action Commentary ACK receipt for previously unACKed data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unACKed data

Congestion Avoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Loss event detected by triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = Threshold Set state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSS Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK

SS or CA Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 24: TCP Congestion Control - cs.colostate.edu

Other TCP Mechanisms

Naglersquos Algorithm and Delayed ACK

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 25: TCP Congestion Control - cs.colostate.edu

Motivation for Naglersquos Algorithm

bull  Interactive applications ndash  SSHtelnetrlogin ndash  Generate many small packets (eg keystrokes)

bull  Small packets are wasteful ndash  Mostly header (eg 40 bytes of header 1 of data)

bull  Appealing to reduce the number of packets ndash  Could force every packet to have some minimum size ndash  hellip but what if the person doesnrsquot type more

characters bull  Need to balance competing trade-offs

ndash  Send larger packets to increase efficiency ndash  hellip but not at the expense of delay

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 26: TCP Congestion Control - cs.colostate.edu

Naglersquos Algorithm bull Wait if the amount of data is small

ndash Smaller than Maximum Segment Size (MSS) bull hellipand some other packet is already in flight

ndash  ie still awaiting the ACKs for previous packets bull That is send at most one small packet per RTT

ndash hellip by waiting until all outstanding ACKs have arrived

bull  Influence on performance ndash Interactive applications enables batching of bytes ndash Bulk transfer no change transmits in MSS-sized packets

anyway

vs

ACK

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 27: TCP Congestion Control - cs.colostate.edu

Delayed ACK - Motivation bull  TCP traffic is often bidirectional

ndash Data traveling in both directions ndash ACKs traveling in both directions

bull  ACK packets have high overhead ndash  40 bytes for the IP header and TCP header ndash hellip and zero data traffic

bull  Piggybacking is appealing ndash Host B can send an ACK to host A ndash hellip as part of a data packet from B to A

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 28: TCP Congestion Control - cs.colostate.edu

TCP Header Allows Piggybacking

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags SYN FIN RST PSH URG ACK

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 29: TCP Congestion Control - cs.colostate.edu

Example of Piggybacking

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

B has data to send

A has data to send

B doesnrsquot have data to send

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 30: TCP Congestion Control - cs.colostate.edu

Increasing Likelihood of Piggybacking

bull  Increase piggybacking ndash  TCP allows the receiver to

wait to send the ACK ndash  hellip in the hope that the host

will have data to send bull  Example sshrlogintelnet

ndash  Host A types characters at a UNIX prompt

ndash  Host B receives the character and executes a command

ndash  hellip and then data are generated ndash  Would be nice if B could send

the ACK with the new data

Data

Data+ACK

Data

A B

ACK

Data

Data + ACK

Works when packet from A causes data to be sent from B

waste

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 31: TCP Congestion Control - cs.colostate.edu

Delayed ACK bull  Delay sending an ACK

ndash  Upon receiving a packet the host B sets a timer ndash  If Brsquos application generates data go ahead and send

bull  And piggyback the ACK bit

ndash  If the timer expires send a (non-piggybacked) ACK

bull  Limiting the wait ndash  Timer of 200 msec or 500 msec ndash  Results in an ACK for every other full-sized packet

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 32: TCP Congestion Control - cs.colostate.edu

TCP Throughput and Fairness

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 33: TCP Congestion Control - cs.colostate.edu

TCP Throughput bull  Whatrsquos the average throughout of TCP as a

function of window size and RTT ndash  Assume long-lived TCP flow ndash  Ignore slow start

bull  Let W be the window size when loss occurs bull  When window is W throughput is WRTT bull  Just after loss window drops to W2 throughput

to W2RTT bull  Average throughout 075 WRTT

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 34: TCP Congestion Control - cs.colostate.edu

Problems with Fast Links An example to illustrate problems bull  Consider the impact of high speed links

ndash  1500 byte segments ndash  100ms RTT ndash  10 Gbs throughput

bull  What is the required window size ndash  Throughput = 75 WRTT

bull  (probably a good formula to remember)

ndash  Requires window size W = 83333 in-flight segments

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 35: TCP Congestion Control - cs.colostate.edu

Example (Cont)

bull  10 Gbs throughput requires window size W = 83333 in-flight segments

bull  TCP assumes every loss is due to congestion ndash  Generally safe assumption for reasonable window size

bull  (Magic) Formula to relate loss rate to throughput Throughput of 10 Gbs with MSS of 1500 bytes gives ndash  13 L = 210-10

ie can only lose one in 5000000000 segments bull  We need new versions of TCP for high-speed nets (topic

for later discussion)

LRTTMSSsdot221Throughput =

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 36: TCP Congestion Control - cs.colostate.edu

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

Simple scenario assume same MSS and RTT

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 37: TCP Congestion Control - cs.colostate.edu

Is TCP Fair Two competing sessions bull  Additive increase gives slope of 1 as throughout increases bull  multiplicative decrease drops throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Conn

ecti

on 2

thr

ough

put

loss decrease window by factor of 2 congestion avoidance additive increase

loss decrease window by factor of 2 congestion avoidance additive increase

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 38: TCP Congestion Control - cs.colostate.edu

More on Fairness Fairness and UDP bull  Multimedia apps often do

not use TCP ndash  do not want rate throttled by

congestion control bull  Instead use UDP

ndash  pump audiovideo at constant rate tolerate packet loss

bull  Research area TCP friendly unreliable transport

Fairness and parallel TCP connections

bull  nothing prevents app from opening parallel connections between 2 hosts

bull  Web browsers do this bull  Example link of rate R

supporting 9 connections ndash  new app asks for 1 TCP gets rate

R10 ndash  new app asks for 11 TCPs gets

11R20 (over half the bandwidth)

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 39: TCP Congestion Control - cs.colostate.edu

Queuing Mechanisms

Random Early Detection (RED) Explicit Congestion Notification (ECN)

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 40: TCP Congestion Control - cs.colostate.edu

Bursty Loss From Drop-Tail Queuing

bull  TCP depends on packet loss to detect congestion ndash  In fact TCP drives the network into packet loss ndash  hellip by continuing to increase the sending rate

bull  Drop-tail queuing leads to bursty loss ndash  When a link becomes congestedhellip ndash  hellip many arriving packets encounter a full queue ndash  And as a result many flows divide sending rate in half ndash  hellip and many individual flows lose multiple packets

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 41: TCP Congestion Control - cs.colostate.edu

Slow Feedback from Drop Tail bull  Feedback comes when buffer is completely full

ndash  hellip even though the buffer has been filling for a while bull  Plus the filling buffer is increasing RTT

ndash  hellip and the variance in the RTT bull  Might be better to give early feedback

ndash  Get one or two flows to slow down not all of them ndash  Get these flows to slow down before it is too late

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 42: TCP Congestion Control - cs.colostate.edu

Random Early Detection (RED) bull  Basic idea of RED

ndash  Router notices that the queue is getting backlogged ndash  hellip and randomly drops packets to signal congestion

bull  Packet drop probability ndash  Drop probability increases as queue length increases ndash  If buffer is below some level donrsquot drop anything ndash  hellip otherwise set drop probability as function of queue

Average Queue Length

Prob

abili

ty

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 43: TCP Congestion Control - cs.colostate.edu

Properties of RED bull  Drops packets before queue is full

ndash  In the hope of reducing the rates of some flows bull  Drops packet in proportion to each flowrsquos rate

ndash  High-rate flows have more packets ndash  hellip and hence a higher chance of being selected

bull  Drops are spaced out in time ndash  Which should help desynchronize the TCP senders

bull  Tolerant of burstiness in the traffic ndash  By basing the decisions on average queue length

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 44: TCP Congestion Control - cs.colostate.edu

Problems With RED bull  Hard to get the tunable parameters just right

ndash  How early to start dropping packets ndash  What slope for the increase in drop probability ndash  What time scale for averaging the queue length

bull  Sometimes RED helps but sometimes not ndash  If the parameters arenrsquot set right RED doesnrsquot help ndash  And it is hard to know how to set the parameters

bull  RED is implemented in practice ndash  But often not used due to the challenges of tuning right

bull  Many variations ndash  With cute names like ldquoBluerdquo and ldquoFREDrdquohellip J

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 45: TCP Congestion Control - cs.colostate.edu

Explicit Congestion Notification bull  Early dropping of packets

ndash  Good gives early feedback ndash  Bad has to drop the packet to give the feedback

bull  Explicit Congestion Notification ndash  Router marks the packet with an ECN bit ndash  hellip and sending host interprets as a sign of congestion

bull  Surmounting the challenges ndash  Must be supported by the end hosts and the routers ndash  Requires two bits in the IP header (one for the ECN

mark and one to indicate the ECN capability) ndash  Solution borrow two of the Type-Of-Service bits in the

IPv4 packet header

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 46: TCP Congestion Control - cs.colostate.edu

Conclusions bull Congestion is inevitable

ndash Internet does not reserve resources in advance ndash TCP actively tries to push the envelope

bull Congestion can be handled ndash Additive increase multiplicative decrease ndash Slow start and slow-start restart

bull Active Queue Management can help ndash Random Early Detection (RED) ndash Explicit Congestion Notification (ECN)

t

Window

Page 47: TCP Congestion Control - cs.colostate.edu

t

Window