homework 5, wireshark project 3, programming project 3 due...
TRANSCRIPT
Thursday, October 18 CS 475 Networks - Lecture 18 1
Lecture 18
Homework 5, Wireshark Project 3, Programming Project 3 due today.
Questions?
Thursday, October 18 CS 475 Networks - Lecture 18 2
Outline
Chapter 5 - End-to-End Protocols
5.1 Simple Demultiplexer (UDP)
5.2 Reliable Byte Stream (TCP)
5.3 Remote Procedure Call (RPC)
5.4 Transport for Real-Time Applications (RTP)
5.5 Summary
Thursday, October 18 CS 475 Networks - Lecture 18 3
Introduction
We've covered connecting computers. The transport layer deals with connecting processes running on computers.
A transport protocol may be expected to provide: guaranteed delivery, in-order delivery, no duplicates, support for large messages, synchronization, flow control, support for multiple processes.
Lower layers may: drop messages, reorder messages, deliver duplicates, limit message size, deliver after a long delay.
Thursday, October 18 CS 475 Networks - Lecture 18 4
Simple Demultiplexer (UDP)
UDP extends the host-to-host service of the network to a process-to-process service but adds no other functionality.
UDP uses 16-bit port numbers to demultiplex between processes. (A process is identified with a port number/IP number pair.)
UDP header format
Thursday, October 18 CS 475 Networks - Lecture 18 5
Simple Demultiplexer (UDP)
Implementation of the port abstraction may vary from OS to OS. Typically, each port is associated with a message queue. When a process receives a message one is removed from the queue.
Thursday, October 18 CS 475 Networks - Lecture 18 6
Simple Demultiplexer (UDP)
How does a client know which port to send a message to on the server? The server may use a well-known port (see /etc/services on a UNIX machine).
Alternatively, the server could run a port mapper process on a well-known port. The client communicates with the port mapper to find the port number of the desired process.
UDP does employ a checksum to verify a message. Packets with errors are dropped.
Thursday, October 18 CS 475 Networks - Lecture 18 7
Reliable Byte Stream (TCP)
In addition to demultiplexing, TCP provides guaranteed, reliable, in-order delivery with flow and congestion control. TCP connections are full-duplex.
Flow control prevents the sender from overwhelming the receiver. Congestion control prevents the sender from overwhelming the network (switches, links).
Thursday, October 18 CS 475 Networks - Lecture 18 8
End-to-End Issues
The sliding window algorithm used by TCP is like that used on a point-to-point link (Section 2.5.2), but there are important differences: 1)Setup (exchange of state so the sliding window algorithm
can start) and teardown are needed
2)RTTs are variable, so timeouts must be adaptive, 3)Packets can be reordered (the maximum segment lifetime
or MSL is typically 120 s)4)Resources are not tied to a single link and can not be
determined in advance (flow control needed), 5)Congestion is possible (congestion control needed)
Thursday, October 18 CS 475 Networks - Lecture 18 9
Segment Format
TCP is a byte-oriented protocol. Bytes are normally collected into segments before before being sent to the destination.
Thursday, October 18 CS 475 Networks - Lecture 18 10
Segment Format
The TCP header is shown at right. A TCP connection is identified by the 4-tuple (SrcPort, SrcIPAddr, DstPort, DstIPAddr).
The HdrLen is the size of the header in 32-bit words.
Thursday, October 18 CS 475 Networks - Lecture 18 11
Segment Format
The Acknowledgment, SequenceNum and AdvertisedWindow fields are used by the sliding window algorithm. Each transmitted byte has a corresponding SequenceNum. Acknowledgment and AdvertisedWindow are associated with received data.
Thursday, October 18 CS 475 Networks - Lecture 18 12
Segment Format
The Flags field contains 6 bits: SYN, FIN, RESET, PUSH, URG, and ACK. SYN and FIN are used to set up a connection.
RESET indicates that the receiver is confused and wants to abort the connection.
PUSH indicates that data should be send immediately.
URG signifies that the segment contains urgent data. The UrgPtr field contains the number of urgent data bytes.
ACK is set when the Acknowledgment field is valid.
Thursday, October 18 CS 475 Networks - Lecture 18 13
Connection Establishment
A three-way handshake is used to set up the connection. Packets contain the initial sequence numbers to be used by the clientand the server (x and y) in subsequent packets.
The TCP specification requires that the initial sequence numbers be random numbers.
Thursday, October 18 CS 475 Networks - Lecture 18 14
Connection Establishment
A trans. diagram for TCP setup and tear down is shown at right. Rectangles show states. Arcs have tags of the form event/action
Retransmissions due to timeouts are not shown.
Thursday, October 18 CS 475 Networks - Lecture 18 15
Sliding Window Revisited
The sliding window algorithm discussed previously provided reliable, in-order delivery. TCP's sliding window algorithm extends the prior one by adding flow control.
Flow control is achieved by having the receiver advertise a window size to the sender instead of using a fixed-size window. The sender is limited to sending no more than AdvertisedWindow bytes of unacknowledged data at any time.
Thursday, October 18 CS 475 Networks - Lecture 18 16
Sliding Window Revisited
The sender maintains three pointers where:LastByteAcked ≤ LastByteSent ≤ LastByteWritten
while on the receiver:LastByteRead < NextByteExpected ≤ LastByteRcvd + 1
Thursday, October 18 CS 475 Networks - Lecture 18 17
Sliding Window Revisited
Assume the send and receive buffers are of size MaxSendBuffer and MaxRcvBuffer. On the receive side TCP must keep:
LastByteRcvd – LastByteRead ≤ MaxRcvBuffer
The advertised window size is AdvertisedWindow = MaxRcvBuffer –
((NextByteExpected – 1) – LastByteRead)
Thursday, October 18 CS 475 Networks - Lecture 18 18
Sliding Window Revisited
On the sending side TCP ensures:LastByteSent – LastByteAcked ≤ AdvertisedWindow
while maintainingLastByteWritten – LastByteAcked ≤ MaxSendBuffer
If the sending process tries to write n bytes in such a way that this inequality would not be maintained then the process is blocked.
Thursday, October 18 CS 475 Networks - Lecture 18 19
Sliding Window Revisited
A 32-bit sequence number will wrap around in 57 minutes at a 10 Mbps transmit rate, but in only 36 seconds at 1 Gbps. An extension to TCP extends the sequence number space.
A 16-bit AdvertisedWindow field allows for a 64 KB window. It should be large enough to allow for a full delay x BW product. A cross country delay of 100 ms at 10 Mbps corresponds to 122 KB. The TCP extension increases the advertised window size also.
Thursday, October 18 CS 475 Networks - Lecture 18 20
Triggering Transmission
TCP will transmit a segment when (1) it has collected a maximum segment size (MSS) number of bytes, (2) the sending process tells it to (a push), or (3) a “timer” expires.
Nagle's Algorithm:when there is data to send
if both the data and the window ≥ MSSsend a full segment
else if there is unACKed data in flightbuffer data until ACK arrives
elsesend all data now
Thursday, October 18 CS 475 Networks - Lecture 18 21
Adaptive Retransmission
Originally, a TimeOut value for retransmission was computed using:
EstimatedRTT = α EstimatedRTT +
(1 – α)SampleRTT
TimeOut = 2 x EstimatedRTT
where SampleRTT is the time between when a segment is sent and its ACK arrives.
The original TCP spec recommended a value of α between 0.8 and 0.9.
Thursday, October 18 CS 475 Networks - Lecture 18 22
Adaptive Retransmission
Unfortunately an ACK for a retransmission is identical to an ACK for the original. This can lead to incorrect values for SampleRTT.
Thursday, October 18 CS 475 Networks - Lecture 18 23
Adaptive Retransmission
The Karn/Partridge algorithm fixed the problem quite simply. SampleRTT was measured only for segments that have been sent once.
The new algorithm included a second change. After each retransmit the next timeout value would be set to twice the previous timeout value (exponential backoff). This helped to alleviate problems due to network congestion.
Thursday, October 18 CS 475 Networks - Lecture 18 24
Adaptive Retransmission
The original algorithm did not handle situations in which the SampleRTT might vary a lot. The Jackobson/Karels algorithm was an improvement:
Difference = SampleRTT – EstimatedRTT
EstimatedRTT = EstimatedRTT+(δ x Difference)
Deviation = Deviation+δ(|Difference|-Deviation)
TimeOut = μ x EstimatedRTT + φ x Deviation
where μ was typically 1 and φ was 4.
Thursday, October 18 CS 475 Networks - Lecture 18 25
Record Boundaries
TCP has two features that allow record boundaries to be put into the byte stream.
TCP allows data to be flagged as urgent or out-of-band. Urgent data can be used to indicate the end of a record.
A TCP push operation can be used to indicate a complete record. (The sockets API does not provide access to the PUSH flag.)
It is usually simpler for record boundary markers to be inserted by the application.
Thursday, October 18 CS 475 Networks - Lecture 18 26
TCP Extensions
There have been four optional extensions to TCP that are implemented using Options in the TCP header:
1) The sender places a 32-bit time stamp in the header. The receiver echoes the time stamp in the ACK. This allows for accurate measurement of the RTT.
Thursday, October 18 CS 475 Networks - Lecture 18 27
TCP Extensions
2) The sequence number and the time stamp are examined to determine if the sequence number has wrapped around.
3) A scaling factor can be included to advertise a window larger than 64 KB.
4) The receiver can respond with a selective acknowledgment (SACK). This allows the sender to (re)transmit just missing segments.
Thursday, October 18 CS 475 Networks - Lecture 18 28
Performance
Now that we have a complete protocol graph, we can discuss how to measure its performance as seen by applications.
In particular, as network speeds increase, can a protocol like TCP provide enough data to keep the network full?
Simple host-to-host in a room. 2 2.4GHz dual cores; 2 Gbps bandwidth.
Thursday, October 18 CS 475 Networks - Lecture 18 29
Performance
TTCP benchmark using various sizes of messages.
Note: "perfect" network, measures TCP implementation and workstation hardware/software only. Will see other issues like congestion.
Thursday, October 18 CS 475 Networks - Lecture 18 30
Alternative Design Choices
TCP is a stream-oriented protocol as opposed to a request/reply protocol. We will examine a request/reply protocol (RPC) next time. (TCP can be used for request/reply applications, but there are complications.)
TCP is a byte-stream rather than a message-stream service. (Record boundaries can however be inserted into the byte stream.)
Thursday, October 18 CS 475 Networks - Lecture 18 31
Alternative Design Choices
TCP uses connection setup and teardown. It is possible to send all connection parameters with the first data message. TCP setup allows a receiver to reject a connection before any data is sent. TCP teardown means that “keep alive” messages don't need to be sent.
TCP uses window-based versus rate-based flow control. There are similarities but also some interesting differences.
Thursday, October 18 CS 475 Networks - Lecture 18 32
In-class Exercises
Log on locally under Linux or log on remotely to csserver to answer the following questions: How do we send out-of-band data via TCP? (man send)
How do we receive out-of-band data? (man recv)
Which of the four TCP extensions described in class are supported under Linux? (man tcp)
What acronym is used for the TCP extension that helps to determine if the sequence number has wrapped around? What does this acronym stand for?
Is there a way to disable Nagle's algorithm so that segments are sent immediately? If so, how?
Thursday, October 18 CS 475 Networks - Lecture 18 33
In-class Exercises
The Linux /proc pseudo-filesystem interface can be used to tune many of the TCP algorithms. Changing the parameters requires system administration privileges. Use cat to examine appropriate /proc file contents (man tcp) and determine the answers to the following: Is the optional SACK extension enabled? What is the default receive buffer size? Is the optional window scaling extension enabled? What is the default congestion control algorithm? Which
algorithms are available for use?