Download - NS-2 LAB: TCP/IP Congestion control
1
NS-2 LAB: TCP/IP Congestion control
2005-2006 - Semester A
Miriam Allalouf
3
Networks Design Course Datagram Networks
Network Layer Service ModelThe transport layer relies on the services of the network layer : communication services between hosts. Router’s role is to “switch” packets between input port to the output port.Which services can be expected from the Network Layer?Is it reliable? Can the transport layer count on the network layer to deliver the packets to the destination?When multiple packets are sent, will they be delivered to the transport layer in the receiving host in the same order they were sent?Amount time between two sends equal to the amount time between two receives?Will be any congestion indication of the network?What is the abstract view of the channel connecting the transport layer in the sending and the receiving end points?
4
Networks Design Course Datagram Networks
Best-effort service is the only service model provided by Internet.No guarantee for preserving timing between packets No guarantee of order keepingNo guarantee of transmitted packets eventual delivery
Network that delivered no packets to the destination would satisfy the definition of best-effort delivery service.
Is Best-effort service equivalent to no service at all?
5
Networks Design Course Datagram Networks
No service at all? Or What the advantages?Minimal requirements on the network layer easier to interconnect networks with very different link-layer technologies: satellite, Ethernet, fiber, or radio use different transmission rates and loss characteristics. The Internet grew out of the need to connect computers together. Sophisticated end-system devices implement any additional functionality and new application-level network services at a higher layer. applications such as e-mail, the Web, and even a network-layer-centric service such as the DNS are implemented in hosts (servers) at the edge of the network. Simple new services addition: by attaching a host to the network and defining a new higher-layer protocol (such as HTTP) Fast adoption of WWW to the Internet.
6
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
full duplex data: bi-directional data flow in
same connection MSS: maximum segment
size connection-oriented:
handshaking (exchange of control msgs) init’s sender, receiver state before data exchange
flow controlled: sender will not overwhelm
receiver
point-to-point: one sender, one receiver
reliable, in-order byte stream: no “message boundaries”
pipelined: TCP congestion and flow
control set window size
7
TCP segment structure
source port # dest port #32 bits
applicationdata
(variable length)
sequence numberacknowledgement
numberrcvr window sizeptr urgent datachecksum
FSRPAUheadlen
notused
Options (variable length)
URG: urgent data (generally not used)
ACK: ACK #valid
PSH: push data now(generally not used)
RST, SYN, FIN:connection estab(setup, teardown
commands)
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
Internetchecksum
8
TCP seq. #’s and ACKsSeq. #’s:
byte stream “number” of first byte in segment’s data
ACKs: seq # of next byte
expected from other side
cumulative ACKQ: how receiver handles out-
of-order segments A: TCP spec doesn’t
say, - up to implementor
Host A Host B
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Usertypes
‘C’
host ACKsreceipt
of echoed‘C’
host ACKsreceipt of
‘C’, echoesback ‘C’
timesimple telnet scenario
9
At TCP Receiver: TCP ACK generation [RFC 1122, RFC 2581]
Event
in-order segment arrival, no gaps,everything else already ACKed
in-order segment arrival, no gaps,one delayed ACK pending
out-of-order segment arrivalhigher-than-expect seq. #gap detected
arrival of segment that partially or completely fills gap
TCP Receiver action
delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK
immediately send singlecumulative ACK
send duplicate ACK, indicating seq. #of next expected byte
immediate ACK if segment startsat lower end of gap
10
TCP: retransmission scenarios
Host A
Seq=92, 8 bytes data
ACK=100
losstimeo
ut
time lost ACK scenario
Host B
X
Seq=92, 8 bytes data
ACK=100
Host A
Seq=100, 20 bytes data
ACK=100
Seq=
92 ti
meo
uttime premature timeout,
cumulative ACKs
Host B
Seq=92, 8 bytes data
ACK=120
Seq=92, 8 bytes data
Seq=
100
timeo
ut
ACK=120
11
TCP Round Trip Time and TimeoutQ: how to set TCP
timeout value? longer than RTT
note: RTT will vary too short: premature
timeout unnecessary
retransmissions too long: slow
reaction to segment loss
Q: how to estimate RTT? SampleRTT: measured time
from segment transmission until ACK receipt ignore retransmissions,
cumulatively ACKed segments
SampleRTT will vary, want estimated RTT “smoother” use several recent
measurements, not just current SampleRTT
12
TCP Round Trip Time and TimeoutEstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially fast typical value of x: 0.125 (1/8)
Setting the timeout EstimtedRTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin
Timeout = EstimatedRTT + 4*Deviation
Deviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT|
13
Principles of Congestion Control
Congestion: informally: “too many sources sending too
much data too fast for network to handle” different from flow control! manifestations:
lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem!
14
Causes/costs of congestion: scenario 1
two senders, two receivers
one router, infinite buffers
no retransmission
large delays when congested
maximum achievable throughput
15
Causes/costs of congestion: scenario 2
one router, finite buffers sender retransmission of lost packet
16
Causes/costs of congestion: scenario 2 always: (goodput) “perfect” retransmission only when loss: retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
in
out=
in
out>
inout
“costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple copies of pkt
17
Causes/costs of congestion: scenario 3 four senders multihop paths timeout/retransmit
in
Q: what happens as and increase ?
in
18
Causes/costs of congestion: scenario 3
Another “cost” of congestion: when packet dropped, any “upstream transmission capacity
used for that packet was wasted!
19
Approaches towards congestion control
End-end congestion control:
no explicit feedback from network
congestion inferred from end-system observed loss, delay
approach taken by TCP
Network-assisted congestion control:
routers provide feedback to end systems single bit indicating
congestion (SNA, DECbit, TCP/IP ECN, ATM)
explicit rate sender should send at
Two broad approaches towards congestion control:
20
TCP Congestion Control end-end control (no network assistance) transmission rate limited by congestion window size, Congwin, over
segments:
w segments, each with MSS bytes sent in one RTT:
throughput = w * MSS
RTT Bytes/sec
Congwin
21
TCP Self-clocking Adjust Congestion window sending rate
adjustment Slow Acks (low link or high delay) Slow cwnd
Increase Fast Acks Fast cwnd Increase
22
TCP congestion control: two “phases”
slow start congestion avoidance
important variables: Congwin based on the
perceived level of congestion. The Host receives indications of internal congestion
threshold: defines threshold between two slow start phase, congestion control phase
“probing” for usable bandwidth: ideally: transmit as
fast as possible (Congwin as large as possible) without loss
increase Congwin until loss (congestion)
loss: decrease Congwin, then begin probing (increasing) again
23
TCP Slowstart
exponential increase (per RTT) in window size (not so slow!)
loss event: timeout (Tahoe TCP) and/or or three duplicate ACKs (Reno TCP)
initialize: Congwin = 1for (each segment ACKed) Congwin++until (loss event OR CongWin > threshold)
Slowstart algorithmHost A
one segment
RTT
Host B
time
two segments
four segments
24
TCP Congestion Avoidance
/* slowstart is over */ /* Congwin > threshold */Until (loss event) { every w segments ACKed: Congwin++ }threshold = Congwin/2Congwin = 1perform slowstart
Congestion avoidance
1
1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs
Tahoe
Reno
25
Additive Increase Additive Increase is a
reaction to perceived available capacity.
Linear Increase basic idea: For each “cwnd’s worth” of packets sent, increase cwnd by 1 packet.
In practice, cwnd is incremented fractionally for each arriving ACK.
TCP congestion avoidance: AIMD: additive
increase, multiplicative decrease increase
window by 1 per RTT
decrease window by factor of 2 on loss event
AIMD
increment = MSS x (MSS /cwnd)cwnd = cwnd + increment
26
27
Fast Retransmit Coarse timeouts remained a problem, and Fast
retransmit was added with TCP Tahoe. Since the receiver responds every time a packet arrives,
this implies the sender will see duplicate ACKs.Basic Idea:: use duplicate ACKs to signal lost packet.
Fast RetransmitUpon receipt of three duplicate ACKs, the TCP Sender
retransmits the lost packet.
28
Fast Retransmit Generally, fast retransmit eliminates about half
the coarse-grain timeouts. This yields roughly a 20% improvement in
throughput. Note – fast retransmit does not eliminate all the
timeouts due to small window sizes at the source.
29
Fast Recovery Fast recovery was added with TCP Reno. Basic idea:: When fast retransmit detects
three duplicate ACKs, start the recovery process from congestion avoidance region and use ACKs in the pipe to pace the sending of packets.
Fast RecoveryAfter Fast Retransmit, half cwnd and commence
recovery from this point using linear additive increase‘primed’ by left over ACKs in pipe.
30
TCP FairnessFairness goal: if N TCP
sessions share same bottleneck link, each should get 1/N of link capacity
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
31
Why is TCP fair?Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance: additive increaseloss: decrease window by factor of 2
congestion avoidance: additive increaseloss: decrease window by factor of 2
32
TCP strains
Tahoe
Reno
Vegas
33
Vegas
34
From Reno to VegasReno Vegas
RTT measurem-t
coarse fine
Fast Retrans 3 dup ack Dup ack + timer
CongWin decrease
For each Fast Retrans.
For the 1st Fast Retrans. in RTT
Congestion detection
Loss detection Also based on measured throughput
35
Some more examples
36
TCP latency modelingQ: How long does it take
to receive an object from a Web server after sending a request?
TCP connection establishment
data transfer delay
Notation, assumptions: Assume one link between
client and server of rate R
Assume: fixed congestion window, W segments
S: MSS (bits) O: object size (bits) no retransmissions (no
loss, no corruption)Two cases to consider: WS/R > RTT + S/R: ACK for first segment in
window returns before window’s worth of data sent WS/R < RTT + S/R: wait for ACK after sending
window’s worth of data sent
37
TCP latency Modeling
Case 1: latency = 2RTT + O/R Case 2: latency = 2RTT + O/R+ (K-1)[S/R + RTT - WS/R]
K:= O/WS
38
TCP Latency Modeling: Slow Start Now suppose window grows according to slow start. Will show that the latency of one object of size O is:
RS
RSRTTP
RORTTLatency P )12(2
where P is the number of times TCP stalls at server:
}1,{min KQP
- where Q is the number of times the server would stall if the object were of infinite size.
- and K is the number of windows that cover the object.
39
TCP Latency Modeling: Slow Start (cont.)
RTT
initiate TCPconnection
requestobject
first w indow= S /R
second w indow= 2S /R
third w indow= 4S /R
fourth w indow= 8S /R
com pletetransm issionobject
delivered
tim e atc lient
tim e atserver
Example:
O/S = 15 segments
K = 4 windows
Q = 2
P = min{K-1,Q} = 2
Server stalls P=2 times.
40
TCP Latency Modeling: Slow Start (cont.)
RS
RSRTTPRTT
RO
RSRTT
RSRTT
RO
stallTimeRTTRO
P
kP
k
P
pp
)12(][2
]2[2
2latency
1
1
1
windowth after the timestall 2 1 kRSRTT
RS k
ementacknowledg receivesserver until
segment send tostartsserver whenfrom time RTTRS
window kth the transmit totime2 1
RSk
RTT
in itia te TC Pconnection
requestobject
firs t window= S /R
second w indow= 2S /R
third w indow= 4S/R
fourth w indow= 8S/R
com pletetransm issionobject
delivered
tim e atc lient
tim e atserver
41
TCP Flow Controlreceiver: explicitly
informs sender of (dynamically changing) amount of free buffer space RcvWindow field
in TCP segmentsender: keeps the
amount of transmitted, unACKed data less than most recently received RcvWindow
sender won’t overrun
receiver’s buffers bytransmitting too
much, too fast
flow control
receiver buffering
RcvBuffer = size or TCP Receive Buffer
RcvWindow = amount of spare room in Buffer
42
TCP Connection Management
Recall: TCP sender, receiver establish “connection” before exchanging data segments
initialize TCP variables: seq. #s buffers, flow control info
(e.g. RcvWindow) client: connection initiator Socket clientSocket = new
Socket("hostname","port number"); server: contacted by client Socket connectionSocket =
welcomeSocket.accept();
Three way handshake:Step 1: client sends TCP SYN
control segment to server specifies initial seq #
Step 2: server receives SYN, replies with SYNACK control segment ACKs received SYN allocates buffers specifies server-to-receiver
initial seq. #Step 3: client sends ACK and data.
43
TCP Connection Management (cont.)
Closing a connection:
client closes socket: clientSocket.close();
Step 1: client end system sends TCP FIN control segment to server.
Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN.
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
44
TCP Connection Management (cont.)
Step 3: client receives FIN, replies with ACK.
Enters “timed wait” - will respond with ACK to received FINs
Step 4: server, receives ACK. Connection closed.
Note: with small modification, can handly simultaneous FINs.
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
45
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
46
Summary principles behind transport
layer services: multiplexing/demultiplexing reliable data transfer flow control congestion control
instantiation and implementation in the Internet UDP TCP
48
Q mng : Packet Dropping : Tail Drop
Tail Drop – packets are dropped when the queue is full
causes the Global Synch. problem with TCP
Queue Utilization100%
TimeTail Drop
49
Packet Dropping : RED
Proposed by Sally Floyd and Van Jacobson in the early 1990s packets are dropped randomly prior to periods of high
congestion, which signals the packet source to decrease the transmission rate
distributes losses over time
50
RED - Implementation Drop probability is based on min_threshold, max_threshold, and mark
probability denominator.
When the average queue depth is above the minimum threshold, RED starts dropping packets. The rate of packet drop increases linearly as the average queue size increases until the average queue size reaches the maximum threshold.
When the average queue size is above the maximum threshold, all packets are dropped.
51
RED (cont.)
Buffer occupancy calculation Average Queue Size – Weighted Exponential Moving Average
0
1
min maxav. queue size
drop prob. …
Max Prob
52
GRED Gentle RED
Buffer occupancy calculation Average Queue Size – Weighted Exponential Moving Average
0
1
min maxav. queue size
drop prob. …
Max Prob
2*max