1 cse 524: lecture 8 transport layer functions specific transport layer protocols
TRANSCRIPT
1
CSE 524: Lecture 8
Transport layer functionsSpecific transport layer protocols
2
Administrative
• Office hours– Haven't been needing them?
– Need a new time for them?
3
Where we’re at…• Internet architecture and history• Internet protocols in practice• Application layer• Transport layer
– Transport layer functions– Specific transport layer protocols
• Network layer• Data-link layer• Physical layer
4
TL: Selective Repeat
• receiver individually acknowledges all correctly received pkts– buffers pkts, as needed, for eventual in-order delivery to upper
layer
• sender only resends pkts for which ACK not received– sender timer for each unACKed pkt
• sender window– N consecutive seq #’s
– again limits seq #s of sent, unACKed pkts
5
TL: Selective repeat: sender, receiver windows
6
TL: Selective repeat
data from above :• if next available seq # in
window, send pkt
timeout(n):• resend pkt n, restart timer
ACK(n) in [sendbase,sendbase+N]:
• mark pkt n as received
• if n smallest unACKed pkt, advance window base to next unACKed seq #
senderpkt n in [rcvbase, rcvbase+N-1]
• send ACK(n)
• out-of-order: buffer
• in-order: deliver (also deliver buffered, in-order pkts), advance window to next not-yet-received pkt
pkt n in [rcvbase-N,rcvbase-1]
• ACK(n)
otherwise: • ignore
receiver
7
TL: Selective repeat in action
8
TL: Selective repeat: dilemma
Example: • seq #’s: 0, 1, 2, 3
• window size=3
• receiver sees no difference in two scenarios!
• incorrectly passes duplicate data as new in (a)
Q: what relationship between seq # size and window size?
9
Done with Transport Layer Functions
• Demux to upper layer• Quality of service• Security• Delivery semantics• Flow control• Congestion control• Reliable data transfer
10
Specific transport layers
• UDP– unreliable (“best-effort”), – unordered – unicast or multicast delivery
• TCP– reliable– in-order– unicast
• SCTP (will not cover in class)– See http://www.ietf.org/rfc/rfc2960.txt– reliable– optional ordering– unicast
11
TL: UDP and Transport Layer Functions
• Demux to upper layer– UDP port field
• Quality of service– none
• Security– none
• Delivery semantics– Unordered– Unicast or multicast
• Flow control– none
• Reliable data transfer– none, but data integrity provided by checksum
• Congestion control– none
12
TL: UDP: User Datagram Protocol
• http://www.rfc-editor.org/rfc/rfc768.txt
• “no frills,” “bare bones” Internet transport protocol
• “best effort” service, UDP segments may be:
– lost
– delivered out of order to app
• connectionless:
– no handshaking between UDP sender, receiver
– each UDP segment handled independently of others
Why is there a UDP?• no connection
establishment (which can add delay)
• simple: no connection state at sender, receiver
• small segment header• no congestion control:
UDP can blast away as fast as desired
13
TL: UDP: more
• often used for streaming multimedia apps– loss tolerant– rate sensitive
• other UDP uses (why?):– DNS– SNMP
• reliable transfer over UDP: add reliability at application layer– application-specific error
recovery!– many applications re-
implement reliability over UDP to bypass TCP
– new transport protocols?
source port # dest port #
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength, in
bytes of UDPsegment,including
header
14
TL: UDP checksum
Sender:• treat segment contents as
sequence of 16-bit integers
• checksum: addition (1’s complement sum) of segment contents
• sender puts checksum value into UDP checksum field
• similar to IP’s header checksum
Receiver:• compute checksum of received
segment
• check if computed checksum equals checksum field value:
– NO - error detected
– YES - no error detected. But maybe errors nonethless? More later ….
Goal: detect “errors” (e.g., flipped bits) in transmitted segment
15
TL: TCP and Transport Layer Functions
• Demux to upper layer• Quality of service• Security• Delivery semantics• Flow control• Reliable data transfer• Congestion control
16
TL: TCP Overview RFCs: 793, 1122, 1323, 2018, 2581
• full duplex data:– bi-directional data flow in same
connection– MSS: maximum segment size
• connection-oriented: – handshaking (exchange of
control msgs) init’s sender, receiver state before data exchange
– protocol implemented at ends (“fate-sharing”)
• flow and congestion controlled:– sender will not overwhelm
receiver or network
• point-to-point:– one sender, one receiver
• reliable, in-order byte steam:– no “message boundaries”
• pipelined:– TCP congestion and flow control
set window size
• send & receive buffers
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
segm en t
applicationwrites data
applicationreads data
17
TL: TCP header
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
ptr urgent datachecksum
FSRPAUheadlen
notused
Options (variable length)
URG: urgent data (generally not used)
ACK: ACK #valid
PSH: push data now(generally not used)
RST, SYN, FIN:connection estab(setup, teardown
commands)
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
Internetchecksum
(as in UDP)
18
TL: TCP connections
• TCP sender, receiver establish “connection” before exchanging data segments– initialize TCP variables:
• Initial sequence #s
• Buffers, flow control info (e.g. RcvWindow)
• Window scaling
• client: connection initiator
• server: contacted by client
• Java API Socket clientSocket = new Socket("hostname","port#”);
Socket connectionSocket = welcomeSocket.accept();
19
TL: TCP connections
• Three way handshake:– Step 1: client end system sends TCP SYN control segment to server
• specifies initial seq #• should be random to prevent spoofing ( http://www.rfc-editor.org/rfc
/rfc1948.txt )– Step 2: server end system receives SYN, replies with SYNACK control segment
• ACKs received SYN• allocates buffers• specifies server-> receiver initial seq. #
– Step 3: client receives SYNACK control segment, replies with ACK and potentially data
• ACKs received SYNACK• goes to established state
20
TL: TCP Connection Establishment
• A and B must agree on initial sequence number selection
• 3-way handshake
A B
SYN + Seq A
SYN+ACK-A + Seq B
ACK-B
21
TL: TCP Sequence Number Selection
• Why not simply chose 0?• Must avoid overlap with earlier incarnation• Client machine seq #0, initiates connection to server
with seq #0.– Client sends one byte and machine crashes
– Client reboots and initiates connection again
– Server thinks new incarnation is the same as old connection
22
TL: TCP Sequence Number Selection
• Why is selecting a random ISN Important?• Suppose machine X selects ISN based on predictable sequence• Fred has .rhosts to allow login to X from Y• Evil Ed attacks
– Disables host Y – denial of service attack– Make a bunch of connections to host X– Determine ISN pattern a guess next ISN– Fake pkt1: [<src Y><dst X>, guessed ISN]– Fake pkt2: desired command– Attack popularized by K. Mitnick
23
TL: TCP ISN selection and spoofing attacks
Ed
Y
X
.rhosts Y
1. Flood continuously
3. TCP SYNACK ACK spoofed Y ISN Send X ISN PACKET DROPPED!
2. Spoof TCP SYN from YWith spoofed Y ISN 6. Real acks
dropped so Ydoes not resetconnection4. Send ACK with guess of X’s ISN
as if you received TCP SYNACK
5. Send pre-canned rlogin/rsh messages rsh echo “Ed” >> .rhostsspoof acknowledgements
Ed7. Door now open, rlogin to X from Ed directly
24
TL: TCP connection setup
CLOSED
SYNSENT
SYNRCVD
ESTAB
LISTEN
active OPENcreate TCBSnd SYN
create TCB
passive OPEN
delete TCB
CLOSE
delete TCB
CLOSE
snd SYN
APP SEND
snd SYN ACKrcv SYN
Send FINCLOSE
rcv ACK of SYNSnd ACK
Rcv SYN, ACK
rcv SYN
snd ACK
25
TL: TCP connections
Data transfer for established connections using sequence numbers and sliding windows with cumulative ACKs
Seq. #’s:– byte stream “number” of first
byte in segment’s dataACKs:
– seq # of next byte expected from other side
– cumulative ACK– duplicate acks sent when out-of-
order packet receivedSee web traceJava API
connectionSocket.receive();clientSocket.send();
Host A Host B
Usertypes
‘C’
host ACKsreceipt
of echoed‘C’
host ACKsreceipt of
‘C’, echoesback ‘C’
timesimple telnet scenario
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Seq=42, ACK=79, data = ‘C’
26
TL: TCP connections
Closing a connection:
Client-initiated close (reverse process for server-initiated close)
Java API:clientSocket.close();
Step 1: client end system sends TCP FIN control segment to server
Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN.
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
27
TL: TCP connections
Step 3: client receives FIN, replies with ACK.
– Enters “timed wait” - will respond with ACK to received FINs
Step 4: server, receives ACK. Connection closed.
Note: with small modification, can handle simultaneous FINs.
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
28
TL: TCP Half-Close
Sender ReceiverFIN
FIN-ACK
FIN
FIN-ACK
Data write
Data ack
29
TL: TCP Connection Tear-down
CLOSING
CLOSE WAITFIN WAIT-1
snd FIN
CLOSE
send FIN
CLOSE
rcv ACK of FIN
LAST-ACK
CLOSED
FIN WAIT-2
snd ACK
rcv FIN
delete TCB
Timeout=2msl
send FIN
CLOSE
send ACK
rcv FIN
snd ACK
rcv FIN
rcv ACK of FIN
snd ACK
rcv FIN+ACK
rcv ACK
ESTAB
TIME WAIT
30
TL: Time Wait Issues
• Cannot close connection immediately after receiving FIN– What if a new connection restarts and uses same sequence
number?
• Web servers not clients close connection first– Established -> Fin-Wait -> Time-Wait Closed– Why would this be a problem?
• Time-Wait state lasts for 2 * MSL– MSL is should be 120 seconds (is often 60s)– Servers often have order of magnitude more connections in
Time-Wait
31
TL: TCP connections
TCP clientlifecycle
TCP serverlifecycle
32
TL: TCP Demux to upper layer
multiplexing/demultiplexing:
• based on sender, receiver port numbers, IP addresses
– source, dest port #s in each segment
– recall: well-known port numbers for specific applications
– Servers wait on well known ports (/etc/services)
gathering data from multiple app processes, enveloping data with header (later used for demultiplexing)
source port # dest port #
32 bits
applicationdata
(message)
other header fields
TCP/UDP segment format
Multiplexing:
33
TL: TCP Demux to upper layer
host A server Bsource port: xdest. port: 23
source port:23dest. port: x
port use: simple telnet app
Web clienthost A
Webserver B
Web clienthost C
Source IP: CDest IP: B
source port: x
dest. port: 80
Source IP: CDest IP: B
source port: y
dest. port: 80
port use: Web server
Source IP: ADest IP: B
source port: x
dest. port: 80
34
TL: TCP and Quality of Service
• Ad hoc…– Connection-based service differentiation
• Web switches
• Operating system policies– Buffer allocation
– Scheduling of protocol handlers
35
TL: TCP and Security
• Transport layer security– Layer underneath application layer and above transport layer– SSL, TLS– Provides TCP/IP connection the following….
• Data encryption• Server authentication• Message integrity• Optional client authentication
– Original implementation: Secure Sockets Layer (SSL)• Netscape (circa 1994)• http://www.openssl.org/ for more information• Submitted to W3 and IETF
– New version: Transport Layer Security (TLS)• http://www.ietf.org/html.charters/tls-charter.html
36
TL: TCP Flow control
• TCP is a sliding window protocol– For window size n, can send up to n bytes without
receiving an acknowledgement
– When the data is acknowledged then the window slides forward
• Each packet advertises a window size– Indicates number of bytes the receiver has space for
• Original TCP always sent entire window– Congestion control now limits this
37
TL: TCP Flow control
receiver: explicitly informs sender of (dynamically changing) amount of free buffer space – RcvWindow field
in TCP segmentsender: keeps the amount
of transmitted, unACKed data less than most recently received RcvWindow
sender won’t overrun
receiver’s buffers bytransmitting too
much, too fast
flow control
receiver buffering
RcvBuffer = size or TCP Receive Buffer
RcvWindow = amount of spare room in Buffer
38
TL: TCP Flow control
• What happens if window is 0?– Receiver updates window when application reads data
– What if this update is lost?• Deadlock
• TCP Persist timer– Sender periodically sends window probe packets
– Receiver responds with ACK and up-to-date window advertisement
39
TL: TCP flow control enhancements
• Problem: (Clark, 1982)– If receiver advertises small increases in the receive window
then the sender may waste time sending lots of small packets
• What happens if window is small?– Small packet problem known as “Silly window syndrome”
• Receiver advertises one byte window
• Sender sends one byte packet (1 byte data, 40 byte header = 4000% overhead)
40
TL: TCP flow control enhancements• Solutions to silly window syndrome• Clark (1982)
– receiver avoidance– prevent receiver from advertising small windows– increase advertised receiver window by min(MSS, RecvBuffer/2)
• Nagle’s algorithm (1984)– sender avoidance– prevent sender from unnecessarily sending small packets– http://www.rfc-editor.org/rfc/rfc896.txt
• “Inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged”
• Allow only one outstanding small (not full sized) segment that has not yet been acknowledged
• Works for idle connections (no deadlock)• Works for telnet (send one-byte packets immediately)• Works for bulk data transfer (delay sending)
41
TL: TCP reliable data transfer
• Segment integrity• Acknowledgement generation• Retransmission
42
TL: TCP RDT segment integrity
• Checksum included in header• Is it sufficient to just checksum the packet contents?• No, need to ensure correct source/destination
– Pseudoheader – portion of IP hdr that are critical
– Checksum covers Pseudoheader, transport hdr, and packet body
– Layer violation, redundant with parts of IP checksum
43
TL: TCP RDT acks and timeouts
• TCP’s reliable data transfer approach– Cumulative acknowledgements
• Receiver sends back the byte number it expects to receive next
• Out of order packets generate duplicate acknowledgements– Receive 1, Ack 2
– Receive 4, Ack 2
– Receive 3, Ack 2
– Receive 2, Ack 5
– Retransmissions• Sender sends segment and sets a timer
• Waits for an acknowledgement indicating segment was received– Send 1
– Wait for Ack 2
– No Ack 2 and timer expires
– Send 1 again
44
TL: TCP RDT acks and timeouts
simplified sender, assuming
waitfor
event
waitfor
event
event: data received from application above
event: timer timeout for segment with seq # y
event: ACK received,with ACK # y
create, send segment
retransmit segment
ACK processing
•one way data transfer•no flow, congestion control
45
TL: TCP RDT acks and timeouts
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) { 04 switch(event) 05 event: data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event: timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event: ACK received, with ACK field value of y 15 if (y > sendbase) { /* cumulative ACK of all data up to y */ 16 cancel all timers for segments with sequence numbers < y 17 sendbase = y 18 } 19 else { /* a duplicate ACK for already ACKed segment */ 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y == 3) { 22 /* TCP fast retransmit */ 23 resend segment with sequence number y 24 restart timer for segment y 25 } 26 } /* end of loop forever */
SimplifiedTCPsender
46
TL: TCP delayed acknowledgements
• Problem:– In request/response programs, you send separate ACK and Data packets
for each transaction• Delay ACK in order to send ACK back along with data
• Solution:– Don’t ACK data immediately
• Wait 200ms (must be less than 500ms – why?)• Must ACK every other packet• Must not delay duplicate ACKs
– Without delayed ACK: 40 byte ack + data packet– With delayed ACK: data packet includes ACK– See web trace example– Extensions for asymmetric links
• See later part of lecture
47
TL: TCP ACK generation [RFC 1122, RFC 2581]
Event
in-order segment arrival, no gaps,everything else already ACKed
in-order segment arrival, no gaps,one delayed ACK pendingout-of-order segment arrivalhigher-than-expect seq. #gap detectedarrival of segment that partially or completely fills gap
TCP Receiver action
delayed ACK. Wait up to 200msfor next segment. If no next segment,send ACKimmediately send singlecumulative ACK
send duplicate ACK, indicating seq. #of next expected byte
immediate ACK if segment startsat lower end of gap
48
TL: TCP retransmission
• Wait at least one RTT before retransmitting packet• Importance of accurate RTT estimators:
– Estimator too low unneeded retransmissions– Estimator too high -> poor throughput, slow reaction to
segment loss
• RTT estimator must adapt to change in RTT– But not too fast, or too slow!
• Backing off the retransmission timeout– Exponential backoff– Double retransmission timer interval after every loss until
successful retransmission
49
TL: TCP retransmission scenarios
Host A
Seq=92, 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92, 8 bytes data
ACK=100
Host A
Seq=100, 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeout,cumulative ACKs
Host B
Seq=92, 8 bytes data
ACK=120
Seq=92, 8 bytes data
Seq=
10
0 t
imeou
t
ACK=120
50
TL: Initial Round-trip Estimator
• Round trip times exponentially averaged:
– Recommended value for x: 0.1-0.2• 0.125 for most TCP’s
– Influence of given sample decreases exponentially fast
• Retransmit timer set to RTT, where = 2– Every time timer expires, RTO exponentially backed-off– Like Ethernet
• Not good at preventing spurious timeouts
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
51
TL: Jacobson’s Retransmission Timeout
• Key observation:– At high loads round trip variance is high
– Need larger safety margin with larger variations in RTT
• Solution:– Base RTO value on RTT and standard deviation (RRTT)
52
TL: Jacobson’s Retransmission Timeout
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
Setting the timeout• EstimtedRTT plus “safety margin”
• large variation in EstimatedRTT -> larger safety margin
Timeout = EstimatedRTT + 4*Deviation
Deviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT|
53
TL: Retransmission Ambiguity
A B
ACK
SampleRTT
Original transmission
retransmission
RTO
A B
Original transmission
retransmissionSampleRTT
ACKRTOX
54
TL: Karn’s algorithm
• Accounts for retransmission ambiguity• If a segment has been retransmitted:
– Don’t count RTT sample on ACKs for this segment
– Keep backed off time-out for next packet
– Reuse RTT estimate only after one successful transmission
55
TL: Timer Granularity
• Many TCP implementations set RTO in multiples of 200,500,1000ms
• Why?– Avoid spurious timeouts – RTTs can vary quickly due to
cross traffic
– Make timers interrupts efficient