csci 183/183w/232 – computer networks transport layer & tcp1 transport layer transport layer...
TRANSCRIPT
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 1
Transport Layer
Transport Layer Services connection-oriented vs. connectionless multiplexing and demultplexing
UDP: Connectionless Unreliable Service TCP: Connection-Oriented Reliable
Service connection management: set-up and tear down reliable data transfer protocols flow and congestion control
Readings: Chapter 5 (5.1, 5.2)
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 2
Transport Protocols
• Lowest level end-to-end protocol.– Header generated by
sender is interpreted only by the destination
– Routers view transport header as part of the payload
77
66
55
77
66
55
TransportTransport
IPIP
DatalinkDatalink
PhysicalPhysical
TransportTransport
IPIP
DatalinkDatalink
PhysicalPhysical
IPIP
router
22 22
11 11
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 3
Transport Services and Protocols
• provide logical communication between app processes running on different hosts
• transport protocols run in end systems – send side: breaks app
messages into segments, passes to network layer
– rcv side: reassembles segments into messages, passes to app layer
• more than one transport protocol available to apps– Internet: TCP and UDP
application
transportnetworkdata linkphysical
application
transportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysicalnetwork
data linkphysical
logical end-end transport
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 4
Transport Layer Services• Underlying best-effort network
– drops messages– re-orders messages– delivers duplicate copies of a given message– delivers messages after an arbitrarily long delay
• Common end-to-end services– guarantee message delivery– deliver messages in the same order they are sent– deliver at most one copy of each message– allow the receiver to flow control the sender– support multiple application processes on each host
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 5
Transport vs. Application and Network Layer• application layer:
application processes and message exchange
• network layer: logical communication between hosts
• transport layer: logical communication support for app processes – relies on, enhances,
network layer services
Household analogy:12 kids sending letters
to 12 kids• processes = kids• app messages =
letters in envelopes• hosts = houses• transport protocol =
Ann and Bill• network-layer
protocol = postal service
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 6
End to End Issues• Transport services built on top of (potentially)
unreliable network service– packets can be corrupted or lost– Packets can be delayed or arrive “out of order”
• Do we detect and/or recover errors for apps? – Error Control & Reliable Data Transfer
• Do we provide “in-order” delivery of packets?– Connection Management & Reliable Data Transfer
• Potentially different capacity at destination, and potentially different network capacity– Flow and Congestion Control
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 7
Internet Transport ProtocolsTCP service:• connection-oriented: setup
required between client, server
• reliable transport between sender and receiver
• flow control: sender won’t overwhelm receiver
• congestion control: throttle sender when network overloaded
UDP service:• unreliable data
transfer between sender and receiver
• does not provide: connection setup, reliability, flow control, congestion control
Both provide logical communication between app processes running on different hosts!
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 8
Multiplexing/Demultiplexing
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= API (“socket”)
delivering received segmentsto correct application process
Demultiplexing at rcv host:gathering data from multipleapp processes, enveloping data with header (later used for demultiplexing)
Multiplexing at send host:
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 9
How Demultiplexing Works
• host receives IP datagrams– each datagram has source IP
address, destination IP address– each datagram carries 1
transport-layer segment– each segment has source,
destination port number (recall: well-known port numbers for specific applications)
• host uses IP addresses & port numbers to direct segment to appropriate app process (identified by “socket’)
source port # dest port #
32 bits
applicationdata
(message)
other header fields
TCP/UDP segment format
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 10
UDP: User Datagram Protocol [RFC 768]
• “no frills,” “bare bones” Internet transport protocol
• “best effort” service, UDP segments may be:– lost– delivered out of order to
app
• connectionless:– no handshaking between
UDP sender, receiver– each UDP segment
handled independently of others
Why is there a UDP?• no connection
establishment (which can add delay)
• simple: no connection state at sender, receiver
• small segment header• no congestion control:
UDP can blast away as fast as desired
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 11
UDP (cont’d)
• often used for streaming multimedia apps– loss tolerant– rate sensitive
• other UDP users– DNS– SNMP
• reliable transfer over UDP: add reliability at application layer– application-specific
error recovery!
source port # dest port #
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength, in
bytes of UDPsegment,including
header
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 12
UDP Checksum
Sender:• treat segment contents
as sequence of 16-bit integers
• checksum: addition (1’s complement sum) of segment contents
• sender puts checksum value (1’s complement of 1’s complement sum of 16-bit words) into UDP checksum field
Receiver:• compute checksum of
received segment• check if computed checksum
equals checksum field value:– NO - error detected– YES - no error detected. But
maybe errors nonetheless? More later ….
Goal: detect “errors” (e.g., flipped bits) in transmitted segment
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 13
Checksum: Example
+
sum: 0100101011001011 checksum(1’s complement): 1011010100110100
verify by adding: 1111111111111111
0110011001100110
1101010101010101
0000111100001111
arrange data segmentin sequences of16-bit words
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 14
TCP Overview• Connection-oriented• Byte-stream
– app writes bytes– TCP sends
segments– app reads bytes
Application process
Writebytes
TCP
Send buffer
Segment Segment Segment
Transmit segments
Application process
Readbytes
TCP
Receive buffer
…
… …
• Full duplex• Flow control: keep sender from
overrunning receiver• Congestion control: keep
sender from overrunning network
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 15
Functionality Split
• Network provides best-effort delivery• End-systems implement many functions
– Reliability– In-order delivery– Demultiplexing– Message boundaries– Connection abstraction– Flow Control– Congestion control– …
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 16
High-Level TCP Characteristics
• Protocol implemented entirely at the ends
– Fate sharing
• Protocol has evolved over time and will continue to do so
– Nearly impossible to change the header– Use options to add information to the header– Change processing at endpoints– Backward compatibility is what makes it TCP
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 17
Evolution of TCP
1975 1980 1985 1990
1982TCP & IP
RFC 793 & 791
1974TCP described by
Vint Cerf and Bob KahnIn IEEE Trans Comm
1983BSD Unix 4.2
supports TCP/IP
1984Nagel’s algorithmto reduce overhead
of small packets;predicts congestion
collapse
1987Karn’s algorithmto better estimate
round-trip time
1986Congestion
collapseobserved
1988Van Jacobson’s
algorithmscongestion avoidance and congestion control(most implemented in
4.3BSD Tahoe)
19904.3BSD Renofast retransmitdelayed ACK’s
1975Three-way handshake
Raymond TomlinsonIn SIGCOMM 75
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 18
TCP Through the 1990s
1993 1994 1996
1994ECN
(Floyd)Explicit
CongestionNotification
1993TCP Vegas
(Brakmo et al)real congestion
avoidance
1994T/TCP
(Braden)Transaction
TCP
1996SACK TCP(Floyd et al)
Selective Acknowledgement
1996Hoe
Improving TCP startup
1996FACK TCP
(Mathis et al)extension to SACK
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 19
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
ptr urgent datachecksum
FSRPAUheadlen
notused
Options (variable length)
TCP Segment Header Structure
URG: urgent data (generally not used)
ACK: ACK #valid
RST, SYN, FIN:connection estab(setup, teardown
commands)
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
Internetchecksum
(as in UDP)
PSH: push data now(generally not used)
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 20
• Each connection identified with 4-tuple:– (SrcPort, SrcIPAddr, DstPort, DstIPAddr)
• Sliding window + flow control– acknowledgment, SequenceNum, AdvertisedWinow
• Flags– SYN, FIN, ACK, RESET, PUSH, URG
• Checksum– pseudo header (src & dst IP addresses) + TCP header +
data
TCP Segment Format (cont)
Sender
Data (SequenceNum)
Acknowledgment +AdvertisedWindow
Receiver
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 21
TCP Connection Set Up
TCP sender, receiver establish “connection” before exchanging data segments
• initialize TCP variables:– seq. #– buffers, flow control info
• client: end host that initiates connection
• server: end host contacted by client
Three way handshake:
Step 1: client sends TCP SYN control segment to server– specifies initial seq #
Step 2: server receives SYN, replies with SYN+ACK control segment
– ACKs received SYN– specifies server receiver
initial seq. #
Step 3:client receives SYN+ACK,
replies with ACK segment (which may contain 1st data segment)
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 22
Question:
a. What kind of “state” client and server need to maintain?
b. What initial sequence # should client (and server) use?
TCP 3-Way Hand-Shake
client
SYN, seq=x
server
SYN+ACK, seq=y, ack=x+1
ACK, seq=x+1, ack=y+1
initiate connection
connectionestablished
connection established
SYNreceived
(1st data segment)
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 23
TCP Connection Setup Example
No. Time Source > Destination Proto SrcPort>DstPort [Flags] 1 13.734375 70.13.155.114 128.101.35.150 TCP 1414 > 22 [SYN] Seq=758244755 Len=0 MSS=1260
2 13.968750 128.101.35.150 70.13.155.114 TCP 22 > 1414 [SYN, ACK] Seq=3778406755 Ack=758244756 Win=25200 Len=0 MSS=1460
3 13.968750 70.13.155.114 128.101.35.150 TCP 1414 > 22 [ACK] Seq=758244756 Ack=3778406756 Win=16384 Len=0
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 24
TCP Connection Setup Example
No. Time Source > Destination Proto SrcPort>DstPort [Flags] 1 13.6611233 70.13.155.114 128.101.35.204 TCP 1567 > 80 [SYN] Seq=3724852786 Len=0 MSS=1260
2 13.890625 128.101.35.204 70.13.155.114 TCP 80> 1567 [SYN, ACK] Seq=484733971 Ack=3724852787 Win=25200 Len=0 MSS=1460
3 13.890625 70.13.155.114 128.101.35.204 TCP 1567 > 80 [ACK] Seq=3724852787 Ack=484733972 Win=17640 Len=0
4 13.890625 70.13.155.114 128.101.35.204 TCP 1567 > 80 [PSH,ACK] Seq=73724852787 Ack=484733972 Win=17640 Len=564
5 14.630860 128.101.35.204 70.13.155.114 TCP 80> 1567 [ACK] Seq=484733972 Ack=3724853351 Win=25200 Len=0 MSS=1460
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 25
Connection Setup Error Scenarios
• Lost (control) packets– What happen if SYN lost? client vs. server actions– What happen if SYN+ACK lost? client vs. server
actions– What happen if ACK lost? client vs. server actions
• Duplicate (control) packets– What does server do if duplicate SYN received?– What does client do if duplicate SYN+ACK received?– What does server do if duplicate ACK received?
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 26
Connection Setup Error Scenarios (cont’d)
• Importance of (unique) initial seq. no.?– When receiving SYN, how does server know it’s a new
connection request?– When receiving SYN+ACK, how does client know it’s a
legitimate, i.e., a response to its SYN request?
• Dealing with old duplicate packets from old connections (or from malicious users)– If not careful: “TCP Hijacking”
• How to choose unique initial seq. no.?– randomly choose a number (and add to last syn# used)
• Other security concern:– “SYN Flood” -- denial-of-service attack
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 27
Detecting Half-Open Connections
1. (CRASH)2. CLOSED3. SYN-SENT <SEQ=400><CTL=SYN>4. (!!) <SEQ=300><ACK=100><CTL=ACK>5. SYN-SENT <SEQ=100><CTL=RST>6. SYN-SENT7. SYN-SENT <SEQ=400><CTL=SYN>
(send 300, receive 100) ESTABLISHED (??) ESTABLISHED (Abort!!) CLOSED
TCP BTCP A
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 28
TCP State Diagram: Connection Setup
CLOSED
SYNSENT
SYNRCVD
ESTAB
LISTEN
active OPENcreate TCPSnd SYN
create TCP
passive OPEN
delete TCP
CLOSE
delete TCP
CLOSE
snd SYN
SEND
snd SYN ACKrcv SYN
Send FINCLOSE
rcv ACK of SYNSnd ACK
Rcv SYN, ACK
rcv SYN
snd ACK
Client
Server
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 29
Client wants to close connection:
Step 1: client end system sends TCP FIN control segment to server
TCP: Closing ConnectionRemember TCP duplex connection!
client server
FIN
serverclosing
ACK
halfclosed
FINclientclosin
g
halfclosed
Step 2: server receives FIN, replies with ACK. half closed
Server finishes sending data, also ready to close:
Step 4: server sends FIN.
Step 3: client receives ACK.
half closed, wait for server to close
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 30
Step 5: client receives FIN, replies with ACK. connection fully closed
TCP: Closing Connection (cont’d)
client
FIN
server
ACK
FIN
clientclosin
ghalf
closed
server
closing
fullclose
d
halfclose
d
ACKfullclose
dProblem Solved?
Well Done!
Step 6: server, receives ACK. connection fully closed
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 31
Step 5: client receives FIN, replies with ACK.
– Enters “timed wait” - will respond with ACK to received FINs
TCP: Closing Connection (revised)client
FIN
server
ACK
FIN
clientclosin
ghalf
closed
server
closing
halfclose
d
Two Army Problem!
Step 6: server, receives ACK. connection fully closed
full closed
fullclose
d
ACKStep 7: client, timer expires, connection fully closed
tim
ed w
ait ACK FINX timeout
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 32
TCP Connection Tear-Down Example
No. Time Source > Destination Proto SrcPort>DstPort [Flags]80 35.156250 70.13.155.114 128.101.35.150 TCP 1414 > 22 [PSH,ACK] Seq=758246388 Ack=3778411633 Win=15920 Len=32 81 35.156250 70.13.155.114 128.101.35.150 TCP 1414 > 22 [FIN, ACK] Seq=758246420 Ack=3778411633 Win=15920 Len=0
82 35.437500 128.101.35.150 70.13.155.114 TCP 22 > 1414 [ACK] Seq=3778411633 Ack=758246420 Win=25200 Len=0 13.968750
83 35.453125 128.101.35.150 70.13.155.114 TCP 22 > 1414 [ACK] Seq=3778411633 Ack=758246421 Win=25200 Len=0 13.96875084 35.453125 128.101.35.150 70.13.155.114 TCP 22 > 1414 [FIN,ACK] Seq=3778411633 Ack=758246421 Win=25200 Len=0 13.968750
85 35.453125 70.13.155.114 128.101.35.150 TCP 1414 > 22 [ACK] Seq=758246421 Ack=3778411634 Win=15920 Len=0
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 33
State Diagram: Connection Tear-down
CLOSING
CLOSEWAIT
FINWAIT-1
ESTAB
TIME WAIT
snd FIN
CLOSE
send FIN
CLOSE
rcv ACK of FIN
LAST-ACK
CLOSED
FIN WAIT-2
snd ACK
rcv FIN
delete TCP
Timeout=2min
send FIN
CLOSE
send ACK
rcv FIN
snd ACK
rcv FIN
rcv ACK of FIN
snd ACK
rcv FIN+ACK
ACK
Active Close
Passive Close
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 34
TCP Connection Management FSM
TCP clientlifecycle
TCP client lifecycle
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 35
TCP Connection Management FSM
TCP serverlifecycle
TCP server lifecycle
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 36
• ARQ vs. FEC– automatic retransmission request– forward error correction
• General ARQ Algorithms – Stop & Wait
• Perform issue: low utilization when delay-bw product large
– Sliding Window Protocols• Go-Back-N• Selective Repeat• Key design issues: window size vs. size of seq. no. space
Reliability and Error Recovery
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 37
Error Recovery: Stop and Wait
Time
Packet
ACKTim
eou
t
• ARQ– Receiver sends
acknowledgement (ACK) when it receives packet
– Sender waits for ACK and timeouts if it does not arrive within some time period
• Simplest ARQ protocol• Send a packet, stop
and wait until ACK arrives
Sender Receiver
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 38
Recovering from Error
Packet
ACK
Tim
eou
t
Packet
ACK
Tim
eou
t
Packet
Tim
eou
t
Packet
ACKT
ime
out
Time
Packet
ACK
Tim
eou
t
Packet
ACK
Tim
eou
t
ACK lost Packet lost Early timeoutDUPLICATEPACKETS!!!
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 39
• How to recognize a duplicate• Performance
– Can only send one packet per round trip
Problems with Stop and Wait
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 40
How to Recognize Resends?
• Use sequence numbers– both packets and acks
• Sequence # in packet is finite How big should it be? – For stop and wait?
• One bit – won’t send seq #1 until received ACK for seq #0
Pkt 0
ACK 0
Pkt 0
ACK 1
Pkt 1ACK 0
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 41
• Can’t keep the pipe full– Utilization is low when bandwidth-delay product (R x RTT)is large!
Sender Receiver
data (L bytes)
ACK
first packet bit transmitted, t = 0
RTT
first packet bit arrives
ACK arrives, send next packet, t =
RTT + L / R
Problem with Stop & Wait Protocol
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 42
Stop & Wait: Performance AnalysisExample:
1 Gbps connection, 15 ms end-end prop. delay, data segment size: 1 KB = 8Kb
– U sender: utilization, i.e., fraction of time sender busy sending
– 1KB data segment every 30 msec (round trip time) --> 0.027% x 1 Gbps = 33kB/sec throughput over 1 Gbps link
00027.0008.30
008.
*/
/
LRRTT
L
RLRTT
RLsenderU
ms 008.0s108
b/s 10
kb 8
bps) rate,ion (transmiss
bits)in length (packet
6
9transmit
R
LT
Moral of story: network protocol limits use of physical resources!
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 43
How to Keep the Pipe Full?• Send multiple packets without
waiting for first to be acked– Number of pkts in flight = window
• Reliable, unordered delivery– Several parallel stop & waits– Send new packet after each ack– Sender keeps list of unack’ed packets;
resends after timeout– Receiver same as stop & wait
• How large a window is needed?– Suppose 10Mbps link, 4ms delay,
500byte pkts• 1? 10? 20?
– Round trip delay * bandwidth = capacity of pipe
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 44
Pipelined (Sliding Window) ProtocolsPipelining: sender allows multiple, “in-flight”,
yet-to-be-acknowledged data segments– range of sequence numbers must be increased– buffering at sender and/or receiver
• Two generic forms of pipelined protocols: Go-Back-N and Selective Repeat
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 45
Pipelining: Increased Utilization
first packet bit transmitted, t = 0
sender receiver
RTT
last bit transmitted, t = L / R
first packet bit arriveslast packet bit arrives, send ACK
ACK arrives, send next packet, t = RTT + L / R
last bit of 2nd packet arrives, send ACKlast bit of 3rd packet arrives, send ACK
U sender
= .024
30.008 = 0.0008
microseconds
3 * L / R
RTT + L / R =
Increase utilizationby a factor of 3!
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 46
Sliding Window
• Reliable, ordered delivery• Receiver has to hold onto a packet until
all prior packets have arrived– Why might this be difficult for just parallel stop &
wait?– Sender must prevent buffer overflow at receiver
• Circular buffer at sender and receiver– Packets in transit buffer size – Advance when sender and receiver agree packets at
beginning have been received
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 47
ReceiverReceiverSenderSender
Sender/Receiver State
… …
Sent & Acked Sent Not Acked
OK to Send Not Usable
… …
Max acceptable
Receiver window
Max ACK received Next seqnum
Received & Acked Acceptable Packet
Not Usable
Sender window
Next expected
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 48
Window Sliding – Common Case
• On reception of new ACK (i.e. ACK for something that was not acked earlier)– Increase sequence of max ACK received– Send next packet
• On reception of new in-order data packet (next expected)– Hand packet to application– Send cumulative ACK – acknowledges reception of all packets up
to sequence number– Increase sequence of max acceptable packet
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 49
Loss Recovery• Go-Back-N recovery
– Set timer upon transmission of each packet– Cumulative ACK– Retransmit all unacknowledged packets– No receiver buffering, out-of-order packets are discarded
• Selective Repeat– Sender keeps a timer for each packet– Selective ACK– Receiver must buffer all out-of-order packets– When timeout, retransmit only one packet
• Performance during loss recovery– No longer have an entire window in transit– Can have much more clever loss recovery
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 50
Go-Back-N in Action
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 51
Selective Repeat• Receiver individually acknowledges all
correctly received pkts– Buffers packets, as needed, for eventual in-order delivery
to upper layer
• Sender only resends packets for which ACK not received– Sender timer for each unACKed packet
• Sender window– N consecutive seq #’s– Again limits seq #s of sent, unACKed packets
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 52
Selective Repeat: Sender, Receiver Windows
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 53
Sequence Numbers
• How large does size of sequence number space need to be?– Must be able to detect wrap-around– Depends on sender/receiver window size
• E.g.– size of seq. no. space = 8, send win=recv win=7– If pkts 0..6 are sent succesfully and all acks lost
• Receiver expects 7,0..5, sender retransmits old 0..6!!!
• size of sequence no. space must be send window + recv window
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 54
Sequence Numbers in TCP• TCP regards data as a “byte-stream”
– each byte in byte stream is numbered.• 32 bit value, wraps around• initial values selected at start up time
• TCP breaks up byte stream in packets.– Packet size is limited to the Maximum Segment Size (MSS)
• Each packet has a sequence number– seq. no of 1st byte indicates where it fits in the byte stream
• TCP connection is duplex– data in each direction has its own sequence numbers
packet 8packet 9 packet 10
13450 14950 16050 17550
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 55
TCP Seq. #’s and ACKs
Seq. #’s:
byte stream “number”of first byte in segment’s data
ACKs:
seq # of next byte expected from other side
host ACKsreceipt
of echoed‘C’
Host A Host B
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Usertypes
‘C’ host ACKsreceipt of
‘C’, echoesback ‘C’
simple telnet scenario
timered: A-to-B green: B-to-A
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 56
TCP Reliable Data Transfer
• TCP creates reliable data transfer service on top of IP’s unreliable service
• Pipelined segments• Cumulative ACKs• TCP uses single
retransmission timer
• Retransmissions are triggered by:– timeout events– duplicate acks
• Initially consider simplified TCP sender:– ignore duplicate acks– ignore flow control,
congestion control
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 57
TCP = Go-Back-N Variant• Sliding window with cumulative acks
– Receiver can only return a single “ack” sequence number to the sender.
– Acknowledges all bytes with a lower sequence number– Starting point for retransmission– Duplicate acks sent when out-of-order packet received
• But: sender only retransmits a single packet.– Reason???
• Only one that it knows is lost• Network is congested shouldn’t overload it
• Error control is based on byte sequences, not packets.– Retransmitted packet can be different from the original lost
packet – Why?
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 58
TCP Sender Events:data rcvd from app:• Create segment with
seq #• seq # is byte-stream
number of first data byte in segment
• start timer if not already running (think of timer as for oldest unacked segment)
• expiration interval: TimeOutInterval
timeout:• retransmit segment
that caused timeout• restart timer ACK received:• If acknowledges
previously unACKed segments– update what is known to
be ACKed– start timer if there are
outstanding segments
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 59
TCP ACK generation [RFC 1122, RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed
Arrival of in-order segment withexpected seq #. One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq. # .Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver Action
Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK
Immediately send single cumulative ACK, ACKing both in-order segments
Immediately send duplicate ACK, indicating seq. # of next expected byte
Immediate send ACK, provided thatsegment starts at lower end of gap
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 60
TCP Flow Control
• receive side of TCP connection has a receive buffer:
• speed-matching service: matching the send rate to the receiving app’s drain rate• app process may be
slow at reading from buffer
sender won’t overflow
receiver’s buffer bytransmitting too
much, too fast
flow control
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 61
TCP Flow Control: How It Works
(Suppose TCP receiver discards out-of-order segments)
• spare room in buffer= RcvWindow (dynamically
changes)
= RcvBuffer-[LastByteRcvd - LastByteRead]
• Rcvr advertises spare room by including value of RcvWindow in segments
• Sender limits unACKed data to RcvWindow– guarantees receive
buffer doesn’t overflow
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 62
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
ptr urgent datachecksum
FSRPAUheadlen
notused
Options (variable length)
TCP Segment Structure
URG: urgent data (generally not used)
ACK: ACK #valid
RST, SYN, FIN:connection estab(setup, teardown
commands)
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
Internetchecksum
(as in UDP)
PSH: push data now(generally not used)
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 63
Triggering Transmission• How does TCP decide to transmit a
segment?– MSS (Maximum segment size)
• Set to size of the largest segment TCP can send without local IP fragmentation (MTU of directly connected)
– Sending process explicitly asked to do (Push to flush) – Firing timer
• Silly Window Syndrome– Flow control needs to be maintained– Sender can transmit full segment (MSS) when Acked
by receiver
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 64
Silly Window Syndrome (cont’d)
– Window currently closed from receiver– ACK opens MSS/2 bytes– Should sender transmit MSS/2?
• Original TCP implementation silent• Early implementation of TCP decided to go ahead• Sender can not know when the window will open for full MSS
– If sender is aggressive, sending available window size• results Silly window syndrome• small segment size remains indefinitely
– Hence a problem when either sender transmits a small segment or receiver opens window a small amount
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 65
Triggering Transmission (cont’d)
– Receiver may delay ACKs, but how long?– Ultimate solution lies with sender:
• When does the TCP sender decide to transmit a segment?
• Nagle’s Algorithm:– Waiting too long hurt interactive applications (Telnet)– Without waiting, risk of sending a bunch of tiny
packets (silly window syndrome)– Wait till timer expires:
• Self clocking: As long as TCP has any data in flight, sender receives an ACK which can be used to trigger transmission
• If no data in flight, immediately send the segment (setting TCP_NoDELAY option)
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 66
TCP Round Trip Time and Timeout
Q: how to set TCP timeout value?
• longer than RTT– but RTT varies
• too short: premature timeout– unnecessary
retransmissions• too long: slow
reaction to segment loss
Q: how to estimate RTT?• SampleRTT: measured time
from segment transmission until ACK receipt– ignore retransmissions,
why?• SampleRTT will vary, want
estimated RTT “smoother”– average several recent
measurements, not just current SampleRTT
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 67
Round-trip Time Estimation
• Importance of accurate RTT estimators:– Low RTT estimate
• unneeded retransmissions– High RTT estimate
• poor throughput
• RTT estimator must adapt to change in RTT– But not too fast, or too slow!
• Spurious timeouts– “Conservation of packets” principle – never more
than a window worth of packets in flight
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 68
Adaptive Retransmission(Original Algorithm)
• Measure SampleRTT for each segment/ ACK pair
• Compute weighted running average of RTT– EstRTT = x EstimatedRTT + (1- x SampleRTT between 0.8 and 0.9 ( to smooth Estimated RTT)- Small indicates temp. fluctuation, a large value more
stable, may not be quick to adapt to real changes
• Set timeout based on EstRTT– TimeOut = 2 x EstRTT
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 69
Retransmission Ambiguity
• ACK is for Original transmission but was for retransmission => Sample RTT is too large
• ACK is for retransmission but was for original => Sample RTT too small
Sender Receiver
Original transmission
ACK
Sam
pleR
TT
Retransmission
Sender Receiver
Original transmission
ACK
Sam
pleR
TT
Retransmission
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 70
Karn/Partridge Algorithm
• Solution:• Do not sample RTT when retransmitting
– only measures sample RTT for segments sent once
• Double timeout for each retransmission– Next timeout to be twice the last timeout, rather than basing it on
the last Estimated RTT
• Karn and Patridge proposal is exponential backoff– Congestion is most likely cause of lost segments– TCP sources should not react too aggressively to a timeout– More timeouts mean more cautious the source should become
(congestion problem)
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 71
Jacobson/ Karels Algorithm• Original computation for RTT did not take the
variance of sample RTTs into account– If variation among samples is small, Estimated RTT can be
better used without increasing the estimate twice– A large variance in the samples mean Time out values should
not be too tightly coupled to the Estimated RTT
• New Calculations for average RTT– Diff = SampleRTT - EstRTT– EstRTT = EstRTT + ( x Diff)– Dev = Dev + ( |Diff| - Dev)
• where is a fraction between 0 and 1
• Consider variance when setting timeout value– TimeOut = x EstRTT + x Dev
• where = 1 and = 4
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 72
TCP Round Trip Time EstimationEstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
• Exponential weighted moving average• influence of past sample decreases exponentially
fast• typical value: = 0.125Setting the timeout interval
• Estimted RTT plus “safety margin”– large variation in EstimatedRTT -> larger safety
margin• “safty margin”: accommodate variations in estimatedRTT
DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|(typically, = 0.25)
TimeoutInterval = EstimatedRTT + 4*DevRTT
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 73
Example RTT Estimation:RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 74
Timestamp Extension
• Used to improve timeout mechanism by more accurate measurement of RTT
• When sending a packet, insert current time into option– 4 bytes for time, 4 bytes for echo a received
timestamp
• Receiver echoes timestamp in ACK– Actually will echo whatever is in timestamp
• Removes retransmission ambiguity– Can get RTT sample on any packet
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 75
Timer Granularity
• Many TCP implementations set RTO (Retransmission Timeout) in multiples of 200,500,1000ms
• Why?– Avoid spurious timeouts – RTTs can vary quickly due
to cross traffic– Make timers interrupts efficient
• What happens for the first couple of packets?– Pick a very conservative value (seconds)
Csci 183/183W/232 – Computer Networks
Transport Layer & TCP 76
Important Lessons
• TCP state diagram setup/teardown
• TCP timeout calculation how is RTT estimated
• Modern TCP loss recovery– Why are timeouts bad?– How to avoid them? e.g. fast retransmit