transport layer3-1 89350 מבוא לתקשורת הרצאות 3-5: רמת התמסורת (chapter 3...

115
Transport Layer 3-1 89350 תתתת תתתתתתת תתתתתת3-5 : תתת תתתתתתת(chapter 3 – Transport) תתתת' תתתת תתתתתתComputer Networking: A Top Down Approach Featuring the Internet, 3 rd edition. Jim Kurose, Keith Ross Addison-Wesley, July 2005. Based on foils by Kurose & Ross ©, see: http://www.aw.com/kurose-ross/ My site: http:// amirherzberg.com

Upload: elmer-moles

Post on 16-Dec-2015

230 views

Category:

Documents


2 download

TRANSCRIPT

Transport Layer 3-1

מבוא לתקשורת 89350: רמת התמסורת3-5הרצאות

(chapter 3 – Transport)

פרופ' אמיר הרצברגComputer Networking: A Top Down Approach Featuring the Internet, 3rd edition. Jim Kurose, Keith RossAddison-Wesley, July 2005.

Based on foils by Kurose & Ross ©, see: http://www.aw.com/kurose-ross/

My site: http://amirherzberg.com

Transport Layer 3-2

Today (& Chapter 3): outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Transport Layer 3-3

Processes & communication

Process: program running within a host.

within same host, two processes communicate using inter-process communication (IPC) Not in this course

processes in different hosts communicate by exchanging messages In this course

Client process: process that initiates communication

Server process: process that waits to be contacted; always on

Transport Layer 3-4

Transport services and protocols communication between

application processes (not just hosts)

Run in end systems send segments via

network layer rcv side: reassemble

segments to messages

Pass to application via socket (buffer)

Internet transport protocols: TCP and UDP

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysicalnetwork

data linkphysical

logical end-end transport

Transport Layer 3-5

Internet transport-layer protocols Connection: TCP

Reliable, in-order delivery

congestion control flow control connection setup

Connection-less: UDP Unreliable, unordered

delivery “best-effort”, like IP

Neither ensure: delay guarantees bandwidth guarantees

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysicalnetwork

data linkphysical

logical end-end transport

Transport Layer 3-7

Multiplexing of Packets within Host Network layer delivers packet to correct host (by IP address) Transport layer delivers packet to correct process How to identify process (in host)?

Use 16-bit port number (in Transport Layer header) Multiplexing: adding ports identifier to packet delivered to IP Demultiplex: deliver packets to correct apps (using ports in

packet) Server: fixed port # <1024

So request reach server HTTP server: 80 Mail server: 25

Client: OS-assigned (`random`), dynamic port >1024 Multiple client processes can work with same server

Process queues data of a connection/port via socket

Transport Layer 3-8

Sockets and Sockets API processes

sends/receive messages via socket API

API=Application Program Interface

Socket: in/out buffer Metaphor: mailbox

process

TCP/UDP

socket

host (Client / Server)

process

TCP/UDP

socket

host

Internet

Sockets API functions: Send, receive Connection (TCP, not UDP): open, close Fix a few parameters, e.g. buffer size (more on this later) Choose transport service…

Application Layer

Transport Layer 3-10

How demultiplexing works host receives datagram

Datagram carries (one) transport-layer segment

Segment has source, destination port numbers

Use IP addresses & ports to identify socket

How? Depends… UDP: one socket per port TCP server: one socket per

<SrcIP,SrcPrt,DstIP, DstPrt>

source port # dest port #

32 bits

applicationdata

(message)

other transportheader fields

Tra

nsp

ort

layer

seg

men

t

Inte

rnet

layer

pack

et

source IP Addressdest IP Address

other Internet layerheader fields

source IP Addressdest IP Address

IP H

ead

er

Transport Layer 3-12

NAPT/NAT: Network Address (Port) Translation

1. SrcIP=10.0.0.1SrcPort=3373DstPort=80

2. SrcIP=133.44.5.8SrcPort=6678DstPort=80

3. DstIP=133.44.5.8SrcPort=80DstPort=6678

4. DstIP=10.0.0.1SrcPort=80DstPort=3373

Goal: share IP addresses among multiple hostsHow: identify host by port

Transport Layer 3-13

NAPT: Network Address/Port Translation

10.0.0.1

10.0.0.2

10.0.0.3

S: 10.0.0.1, 3345D: 128.119.40.186, 80

1

10.0.0.4

138.76.29.7

1: host 10.0.0.1 sends datagram to 128.119.40, 80

NAT translation tableWAN side addr LAN side addr

138.76.29.7, 5001 10.0.0.1, 3345…… ……

S: 128.119.40.186, 80 D: 10.0.0.1, 3345

4

S: 138.76.29.7, 5001D: 128.119.40.186, 80

2

2: NAT routerchanges datagramsource addr from10.0.0.1, 3345 to138.76.29.7, 5001,updates table

S: 128.119.40.186, 80 D: 138.76.29.7, 5001

3

3: Reply arrives dest. address: 138.76.29.7, 5001

4: NAT routerchanges datagramdest addr from138.76.29.7, 5001 to 10.0.0.1, 3345

Transport Layer 3-14

UDP: Connectionless demultiplexing Create sockets with port

numbers:DatagramSocket mySocket1 =

new DatagramSocket(9111);

DatagramSocket mySocket2 = new DatagramSocket();

UDP socket identified by destination port (and IP)

When host receives UDP segment: checks destination port

number in segment directs UDP segment to

socket with that port number

Regardless of source IP and port

Transport Layer 3-15

UDP: Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428);

ClientIP:B

P3

client IP: A

P1P1P2

serverIP: C

SP: 6428DP: 9157

SP: 9157DP: 6428

SP: 6428DP: 5775

SP: 5775DP: 6428

DP= Destination Port, SP= Source PortClient’s SP provides “return address” (server’s DP)Client’s DP (destination port on server) known to client (e.g. standard)

Transport Layer 3-16

TCP: Connection-oriented demux

TCP socket identified by 4-tuple: source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets: each socket identified

by its own 4-tuple

E.g., web servers have different sockets for each connecting client

Transport Layer 3-17

TCP: Connection-oriented demux

ClientIP:B

P3

client IP: A

P1P1P3

serverIP: C

SP: 80DP: 9157

SP: 9157DP: 80

SP: 80DP: 5775

SP: 5775DP: 80

P4

What is different here? Would this work well for web server?

Transport Layer 3-18

TCP: Connection-oriented demux

ClientIP:B

P3

client IP: A

P1P1P2

serverIP: C

SP: 80DP: 9157

SP: 9157DP: 80

SP: 80DP: 5775

SP: 5775DP: 80

Multithreading server may listen to multiple ports

Transport Layer 3-19

Internet transport layer servicesTCP service: connection-oriented: setup

overhead required TCP setup: one round trip

reliable transport: no loss (but some overhead)

flow control: feed (send) at receiver’s speed

congestion control: throttle sender when network overloaded

does not provide: delay, bandwidth guarantees

UDP service: Datagram: no setup Unreliable transport: loss does not provide:

Reliability Flow/congestion control Delay or bandwidth

guarantees

Others (e.g. RDP)Q: Would we ever choose UDP?

Transport Layer 3-20

Choosing transport service…Reliability, congestion, flow

controls TCP: yes, UDP: no Overhead, slowdown Often required

E.g. file transfer, e-mail Not always, e.g., audiocast

Size and complexity Code size, complexity are

sometimes critical (…UDP) E.g., dust computer, sensor

Bandwidth & Delay Guarantee

Not in TCP,UDP Ok for “elastic” appl.

Use whatever bandwidth & delay they get

E.g. e-mail, Web Phone & other apps require

minimal Quality of Service

Setup overhead? TCP: yes, UDP: no Ignore in long-lived app

E.g. file transfer, telnet

Sometimes significant E.g. DNS queries

Transport Layer 3-21

Chapter 3 outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Transport Layer 3-22

UDP: User Datagram Protocol [RFC 768]

“no frills,” “bare bones” transport protocol

“best effort” service, UDP segments may be: lost delivered out of order

to app connectionless:

no handshaking each UDP segment

handled independently of others

Why is there a UDP? no connection setup

(delay) simple: no connection

state at sender, receiver no retransmission no congestion control:

UDP can blast away as fast as desired (or more…)

smaller segment header

Transport Layer 3-23

UDP: header and more

UDP often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS (why?) SNMP – Simple Network

Management Protocol UDP Congestion/flow control?

By application; flexible Limited buffer per port

[notes] Reliability over UDP?

Checksum detects (most) corruptions [hidden foils]

Requires application-specific error recovery!

source port # dest port #

32 bits

Applicationdata

(message)

UDP segment format

length checksum

Length, inbytes of UDP

segment,including

header

Transport Layer 3-26

Chapter 3 outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Transport Layer 3-27

Principles of Reliable data transfer important in app., transport, link layers top-10 list of important networking topics!

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 3-29

RDT Protocol : Events Finite state machine Inputs:

rdt_send(m) – message m from application rdt_rcv(p) – packet p from (unreliable) channel

Outputs: udt_send(p) – packet p to (unreliable) channel deliver_data(m) – message m to application ready( ) – to receive another message from

application

Execution: Sequence of input, output events Outputs defined by protocol

Transport Layer 3-30

Reliable data transfer: planWe’ll: Present few (improving) versions of RDT protocol

From strong to weak assumptions on input events (channel)

Later, also add more inputs/outputs…

Consider only unidirectional message transfer but bidirectional packets (&control info) flow

Define protocol by pseudo-code or state diagram: States S={S1,S2,…} Events E={e1,e2,…} Actions A={a1,a2,…} Transition rules,

SEAS, e.g.:

stateS1

stateS2

Event e1 causing transition S1S2Action a1 taken on state transition

Event e2Action a2

Event e3Action a3

S3

Transport Layer 3-31

Rdt1.0: reliable transfer over a reliable channel

underlying channel (udt) perfectly reliable no bit errors no loss of packets No need to setup and teardown connection (always

up)

separate state machines for sender, receiver: sender sends data into underlying channel receiver read data from underlying channel

Wait for call from above

packet = make_pkt(data)udt_send(packet)

rdt_send(data)

extract (packet,data)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiver

Transport Layer 3-32

Rdt2.0: udt channel with bit errors

underlying channel may flip bits in packet recall: UDP checksum to detect bit errors Assume: errors detected, no packet loss / reordering

the question: how to recover from errors: acknowledgements (ACKs): receiver explicitly tells sender

that pkt received OK negative acknowledgements (NAKs): receiver explicitly

tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt2.0 (beyond rdt1.0): error detection receiver feedback: control msgs (ACK,NAK) rcvr->sender `ready` output [why?]

Transport Layer 3-33

rdt2.0: FSM specification

Wait for call from above

sndpkt = make_pkt(data, checksum)udt_send(sndpkt)

extract(rcvpkt,data)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) && isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) && isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) && corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

ready

Transport Layer 3-34

rdt2.0: operation with no errors

Wait for call from above

sndpkt = make_pkt(data, checksum)udt_send(sndpkt)

extract(rcvpkt,data)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) && isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) && isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) && corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

ready

Transport Layer 3-35

rdt2.0: error scenario

Wait for call from above

sndpkt = make_pkt(data, checksum)udt_send(sndpkt)

extract(rcvpkt,data)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) && isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) && isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) && corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

ready

Transport Layer 3-36

Analysis of RDT 2.0 Goals of RDT2.0:

Delivered messages are prefix of sent messages

Eventually, all messages get delivered, `ready` for more

Assumptions of RDT 2.0: Every packet sent, is eventually delivered But: possibly corrupted (always detected) 1-to-1 mapping from packets delivered to

sent Goals are not met. Why?

Transport Layer 3-37

rdt2.0 has a fatal flaw!

What happens if ACK/NAK corrupted?

1st : add checksum… but sender still doesn’t know

what happened at receiver!

can’t just retransmit: possible duplicate

What to do? sender ACKs/NAKs

receiver’s ACK/NAK? What if this ACK/NAK garbled?

Retransmit on garbled Danger: duplicates

Handling duplicates: sender adds sequence number

to each pkt sender retransmits current pkt

if ACK/NAK garbled receiver discards (doesn’t

deliver) duplicate pkt

Sender sends one packet, then waits for receiver’sresponse; use (1-bit) sequence number

RDT 2.1, aka `stop & wait` protocol

Transport Layer 3-40

RDT2.1 in pseudo-code…

Sender:state=ready; i=0;On rdt_send(m):

if state=readythen { p=<m,i,EDC(m,i)>; udt_send(p); state=wait; }

On rdt_rcv(p’) If

p’=<“ack”,EDC(“ack”)> then { i=i+1 mod 2; state=ready; }else udt_send(p);

Receiver:i=0;On rdt_rcv(p) If p=<m,j,EDC(m,j)>

then { c=EDC(“ack”); udt_send(“ack”,c); if i=j then

{ deliver(m); i=i+1 mod 2; } }else rdt_sent(“nack”);

EDC: Error Detection Code (e.g. checksum)

Transport Layer 3-41

Analysis of RDT 2.1 [sketch] Goals, Assumptions: same as RDT 2.0

Every packet sent, is eventually delivered But: possibly corrupted (always detected) 1-to-1 mapping from packets delivered to sent

Define `legitimate` states: [E=Even, O=Odd]1. RE/RO: Ready (to receive E/O message); channel empty2. ME/MO: Wait for E/O Ack; channel has new E/O message3. AE/AO: Wait for E/O Ack; channel has E/O Ack4. MER/MOR: Wait for E/O Ack; channel has E/O message

which was already delivered5. WEAO: Wait for E Ack but channel has O Ack6. WOAE: Wait for O Ack but channel has E Ack

Prove by induction: system stays in legitimate state!

Transport Layer 3-42

Analysis of RDT 2.1 [sketch, cont’] Claim: system always in legitimate state Base: initial state is (legitimate) RE Consider moves from legitimate states:

1. RE/RO: only to ME/MO2. ME/MO: If received Ok, move to AE/AO;

otherwise, to WEAO/WOAE3. AE/AO: If received Ok, move to RO/RE;

otherwise, to MER/MOR4. MER/MOR: same as ME/MO!5. WEAO/WOAE: move to MER/MOR

If received correctly… and also if corrupted!

System always stays in legit state!! □

Transport Layer 3-43

RDT2.1: Discussion

Nack/Ack, 1 bit counter Handles (detectable) packet corruptions Assume:

No processor faults No connection setup / teardown (always up) All error in packets detected (by checksum) No packet loss / reordering / duplication Duplication of message is no problem But duplicate ACK can cause loss RDT2.2: handle duplication: add seq # to

ack…

Transport Layer 3-44

RDT2.2: duplicate-tolerant protocol

Same functionality as rdt2.1, using ACKs only

Instead of NAK, receiver sends ACK for last pkt received OK Explicitly include seq # of pkt being ACKed Resilient to packet (incl. Ack) duplication

• Exercise: why RDT2. 1fails if Ack is duplicated? Important: allows sender to retransmit… see

later! Duplicate ACK at sender results in same

action as NAK: retransmit current pkt

Transport Layer 3-46

RDT2.2: pseudo-code…

Sender:state=ready; i=0;On rdt_send(m):

if state=readythen { p=<m,i,EDC(m,i)>; udt_send(p); state=wait; }

On rdt_rcv(p’) Let c=EDC (“ack”,i);

If p’=<“ack”,i,c> then { i=i+1 mod 2; state=ready; }else udt_send(p);

Receiver:

i=0;On rdt_rcv(p’) If p’=<m,i, EDC(m,i)>

then { p=<“ack”,i,EDC(“ack”,i)>; udt_send(p); deliver(m); i=i+1 mod 2; }else rdt_sent(p);

Transport Layer 3-47

RDT3.0: Alternating Bit Protocol

Allow packet loss: channel can also lose packets (data or ACKs) checksum, seq. #, ACKs,

retransmissions are not enough

Q: how to deal with loss? 1st Idea: sender waits

maximal round-trip time (RTT), then retransmits

Pro: simple, no duplicates Good for fixed, known RTT What for variable RTT??

Better idea: sender waits “reasonable” amount of time for ACK (<max RTT)

retransmits if no ACK received during this time

if pkt (or ACK) just delayed (not lost): retransmission will be

duplicate, but use of seq. #’s already handles this

Recall: receiver includes seq # of pkt being ACKed

requires countdown timer

Handles errors, dup and loss, but no reordering

Transport Layer 3-49

Alternating Bit Protocol (RDT3.0): pseudo-code for sender…

state=ready; i=0;On rdt_send(m):

if state=readythen { p=<m,i,EDC(m,i)>; udt_send(p); state=wait; sleep(“timeout”,T); }

On rdt_rcv(p’)If p’=<“ack”,i,EDC(“ack”,i)> then { i=i+1 mod 2; state=ready; abort(“timeout”); }

On WakeUp(“timeout”): {udt_send(p); sleep(“timeout”,T); }

Recipient: as in RDT 2.2!

Transport Layer 3-51

Alternating Bit (rdt3.0) in action

Transport Layer 3-52

Alternating bit (rdt3.0) in action

Transport Layer 3-53

Alternating bit (rdt 3.0): problems

No pipelining (at most one packet on link) Performance problem

Assumes FIFO (no reordering) But datagram networks may not ensure FIFO!

No setup and teardown of connection Teardown required to allow appl to take

corrective/alternate action if not connected And to avoid keeping state indefinitely Also: to handle processor faults

Transport Layer 3-54

Rest of Transport Layer…

TCP Overview TCP Reliability TCP Timeouts TCP Flow Control 3.6 Principles of congestion control 3.7 TCP congestion control TCP Fairness

Transport Layer 3-55

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

pipelined: TCP congestion and

flow control set window size

flow control: sender will not overwhelm receiver

send & receive buffers

full duplex data: bi-directional data flow

in same connection

point-to-point: one sender, one

receiver

reliable, in-order byte steam: no “message

boundaries”

connection-oriented: Handshake before data

exchange Teardown (to free

resources, detect failure)

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 3-56

TCP segment structure

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

U (URG): urgent data (generally not used)

A (ACK): this is an ack of ack-numberP (PSH): push data(don’t aggregate)

RST, SYN, FIN:connection estab(setup, teardown

commands)

# segmentsrcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

(as in UDP)

Transport Layer 3-57

TCP Services TCP connection-based transport service:

Setup connection Use connection Close connection

TCP connection services: Flow control: don’t send faster than receiver can

absorb [later] Congestion control: slow down if network gets

congested [later] Fairness: fair sharing of Net resources among

connections [later] Reliable, pipelined stream service – next!

Transport Layer 3-58

TCP reliable data transfer

TCP: reliability on top of IP’s unreliable service Packet loss / error Duplicates Out of order Setup/teardown Loss of state in proc.

Pipelined segments hybrid of GBN & SR But a single

retransmission timer Substantial flexibility

Cumulative acks Resend (duplicate)

ACK instead of NACK Retransmissions are

triggered by: timeout events duplicate ACKs (later)

Initially consider simplified TCP sender: ignore duplicate acks ignore flow control,

congestion control Window of N bytes

Transport Layer 3-59

Reliable (TCP) Connection

What is a reliable (TCP) connection? Reliable connection management

Setup Teardown

Reliable communication in connection Delivery FIFO…

Connection fails if and only if packets cannot pass

Transport Layer 3-60

Reliable connection: More Interface… Connection status at both ends:

Link Up/Down indicators [Sender: also Ready/Wait] Open/Close Connection commands rdt_send, deliver only when link Up [sender: and

Ready]

sendside

receiveside

Link up / down (Ok/Fault)

Ready / Wait Open/CloseOpen/Close

Transport Layer 3-61

TCP Reliability Reliable Connection Management

Setup (SYN, SYN-ACK), tear-down (later) Reliable Communication in Connection

Counters, timeout, retransmit [like RDT3.0]

Syn

SynAck

A (Client)

B (Server) Up

Up

1 2 2

Ack1

Timeout

X

Transport Layer 3-62

Reliable Connection Management Three basic states:

Up Down Syn (in progress) Teardown states - later

Send, receive only when connection is Up Connection setup (SYN):

Completes in finite time But may fail

Connection closure (FIN): Completes in finite time

Transport Layer 3-63

TCP Connection States Three basic states:

Up Down Syn (in progress)

Send, receive only when connection Up

Syn

A

BUp

Up

Syn

Syn

X

X

X

X

X

Syn

Syn Syn

Syn Syn

Up

UpUp

Transport Layer 3-64

TCP Connection Teardown Ordered and Forced Teardown

mechanisms Forced Teardown:

When `fatal` error detected: Send RST signal and abort (break) connection Same when receiving RST signal

• If sequence number in RST is within `window` [why?]

Ordered Teardown: Ensure all messages sent are received Ensure peer will also close connection

Transport Layer 3-65

TCP Connection Tear-Down

Closing a connection:

client closes socket: clientSocket.close();

Step 1: client end system sends TCP FIN control segment to server

Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN.

client

FIN

server

ACK

ACK

FIN

close

closing

closed

tim

ed w

ait

Step 1

Step 2

closed

Transport Layer 3-66

TCP Connection Tear-Down (cont.)

Step 3: client receives FIN, replies with ACK.

Enters “timed wait” - will respond with ACK to received FINs (WHY??)

Step 4: server, receives ACK. Connection closed.

Exercise: Modify to allow (also) server to initiate TearDown (`simultaneous FINs`)

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Step 1

Step 2

Step 3

(Step 4)

Transport Layer 3-67

TCP Reliable Connection Closure Fin, Fin-Wait: new states

Fin : closure begun; receive OK but don’t send

Fin-Wait: wait (30sec), to ensure peer is done• Only by closure initiator (usually client)

Initiator: send FIN, wait for ACK and FIN Responder: ACK (to FIN), last messages,

FIN Done when receiving ACK to FIN

A

B Up

FINAck

dataAck

FIN FINAck Ack

Up 30 sec

Fin-Wait

Fin

FinDown

Down

Transport Layer 3-68

TCP Connection: client lifecycle

RST or timeout;send RST

RST

Transport Layer 3-69

TCP Connection: server lifecycle

Receive Ack

Transport Layer 3-70

Reliable Connection Management Send, receive only when connection Up Setup (SYN): complete/fail in finite time Closure (FIN): completes/fail in finite time Reset (RST): only due to disconnection Each Up at A (B) maps to unique Syn at B

(A)

Syn

A

BUp

Up

Syn

Syn

X

X

X

X

X

Syn

Syn Syn

Syn Syn

Up

UpUp

Transport Layer 3-72

Mapping Between Connections Each Up at A (B) maps to unique Syn at B

(A) Use mapping functions MAPA, MAPB

b=MAPA(a): map from A’s Up to B’s Syn

In example: MAPA is identity, i.e. MAPA(a)=a

But: MAPB(1)=1, MAPB(2)=4 ! In general: MAP is monotonous, one-to-

one

Syn

A

BUp(1)

Up (1)

Syn (1)

Syn(1)

X

X

X

X

X

Syn(2)

Syn (2) Syn (3)

Syn(3)

Syn (4)

Up(2)

Up (3)

Up(2)

Transport Layer 3-73

Stateful Connection Management Reliable connection management is

easy… If we count Syn periods:

Each party maintains Syn periods counter Send your Syn counter with SYN Echo other party’s Syn counter in SYN-ACK Ignore SYN-ACK with old echoed counter

Syn,1

A

BUp(1)

Up (1)

Syn (1)

Syn(1)

X

X

X

X

X

Syn(2)

Syn (2) Syn (3)

Syn(3)

Syn (4)

Up(2)

Up (3)

Up(2)

Syn,1;1

Ack,1

Fin,1 Syn,2

Syn,3

Syn,2;3

Ack,2Ack,3

Syn,4

Transport Layer 3-74

Stateless Connection Management Stateful connection management is easy But servers, clients may lose state And… too much state – should cnn.com

keep state for each client (forever)? And… How to identify repeating client?

IP address – which (DHCP, NAPT)? IP + port - Fails for NAPT; ports are temporary

Need Stateless Connection ManagementUse random numbers instead of counters!

Transport Layer 3-75

Stateless Connection ManagementUse random seq# to identify Syn period

Select, send your seq# with SYN, SYN-ACK Echo other party’s seq# with SYN-ACK, ACK Ignore SYN responses with wrong (old) echoed

identifier

37

A

BUp(1)

Up (1)

Syn (1)

Syn(1)

X

X

X

X

X

Syn(2)

Syn (2) Syn (3)

Syn(3)

Syn (4)

Up(2)

Up (3)

Up(2)

66;A37A66

F37 88

42

91;A42

A91A63

34

13:A37

Transport Layer 3-76

Stateless Three Way Handshake

Setup connection client: initiator server: respondingEach choose

random seq# Match SYN, SYNACK Seq# to ensure no-

loss of data Use flags in TCP

header: SYN, ACK, RST, FIN

Three way handshake:

Step 1: client sends TCP SYN segment to server Specify client seq# no data

Step 2: server receives SYN, replies with SYN-ACK segment

Allocate buffers

Ack client’s seq#

Send server’s seq# no data

Step 3: client receives SYN-ACK, validates #, Acks server # and (optionally) sends data

Transport Layer 3-77

Reliable Communication Whenever a connection terminates (even

with failure) Sequence received is prefix of sequence sent No gaps No reordering No insertions No duplications

Well-terminated connections: At one or (usually) both ends

• Can’t guarantee to always detect at both ends All bytes delivered

• Sequence received = sequence sent• No truncations

Transport Layer 3-78

TCP Reliability in Connection Sequence numbers kept by both parties 32 bit counters, initialized randomly

When overflow, `wrap` to begin from zero Send with each packet sent to peer:

Sequence number of first byte sent in this packet Seq. number of next byte expected from peer All bytes till this one were received Ok (ACK)

SYN,37

A

B66;A37

“Hi”,37,A66

“Ready”,66,A39

“Wait”,39,A71

71;A43

Transport Layer 3-79

Loss and Retransmissions Packet or Ack is lost

Often due to congested router TCP retransmits lost packet Loss detected by timeout

Another method later

SYN,37

A

B66;A37

“Hi”,37,A66

“Ready”,66,A39

“Wait”,39,A71

71;A43X“Wait”,39,A71

Timeout

Transport Layer 3-80

Premature Timeout Duplicates What if packet is delayed, but not lost?

Often due to congested router… Delayed too long TCP retransmits it

duplication No harm – TCP discards duplicate packets

Send another ACK, in case first one was lost

SYN,37

A

B66;A37

“Hi”,37,A66

“Ready”,66,A39

“Wait”,39,A71

71;A43

Timeout

Transport Layer 3-81

Stop-and-wait [RDT3.0]: inefficiency

first packet bit transmitted, t = 0

sender receiver

RTT

last packet bit transmitted, t = L / R

first packet bit arriveslast packet bit arrives, send ACK

ACK arrives, send next packet, t = RTT + L / R

U sender

= L / R

RTT + L / R

Transport Layer 3-82

Pipeline / Window for Efficiency Protocol as described is inefficient:

Transmission rate = 10 Mb/sec Max segment size (MSS) = 1250B= 10000 bits Round Trip Time (RTT) = 100msec

Includes transmit, propagation, queuing File size = 125KB = 1 million bits = 100

segments Time to send= 100 segments * 100msec = 10sec

Idea: send file without waiting for ACK? Transmission time= (1Mb)/(10Mb/sec)=100msec RTT = 200msec Send 100 segments in pipeline / window

Transport Layer 3-83

Pipelined protocolsPipelining: sender allows multiple, “in-flight”, yet-

to-be-acknowledged pkts Need `real` sequence numbers (not 1 bit)

also deals with out-of-order (old) packetsTCP: count bytes, begin with random numbers from setup

buffering at sender and (optionally) receiver

Pipelining improves link utilization

Transport Layer 3-84

Pipelining: increased utilization

first packet bit transmitted, t = 0

sender receiver

RTT

last bit transmitted, t = L / R

first packet bit arriveslast packet bit arrives, send ACK

ACK arrives, send next packet, t = RTT + L / R

last bit of 2nd packet arrives, send ACKlast bit of 3rd packet arrives, send ACK

Transport Layer 3-85

TCP seq. #’s and ACKs with Pipeline

Seq. #’s: byte stream

“number” of first byte in segment’s data

ACKs: seq # of next byte

expected from other side

cumulative ACKQ: how receiver handles

out-of-order segments A: ignore old pkts;

buffer or ignore future packets

Host A Host B

Seq=42, ACK=79, data = ‘Req’

Seq=79, ACK=45, data = ‘Ok’

Seq=45, ACK=81

Usersends‘Req’

Client ACKsreceipt of ‘Ok’

host ACKsreceipt of

‘Req’, sendsback ‘Ok’

timesimple telnet scenario

Transport Layer 3-86

TCP: retransmission scenarios

Host A

Seq=100, 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92, 8 bytes data

ACK=120

Seq=92, 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92, 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 3-87

TCP retransmission scenarios (more)

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100, 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 3-88

TCP sender events (simplified):TCP_send(m): segment=<seq#,

m>; seq # is byte-stream

number of first data byte in segment

start timer if not already running Timer is for oldest

unacked segment

Timeout: retransmit segment

that caused timeout restart timer Ack rcvd: If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 3-90

Duplicate Ack & Fast Retransmit

Time-out period often relatively long: long delay before

resending lost packet

Detect lost segments via duplicate ACKs. Sender often sends

many segments back-to-back

If segment is lost, there will likely be many duplicate ACKs.

If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: fast retransmit: resend

segment before timer expires

Why not two? Exercise…

Transport Layer 3-91

On TCP_RCV(ACK,y) // and valid checksum if (y > SendBase) then

{ remove y-SendBase bytes from Unacked; SendBase = y;

if (NextSeqNum>SendBase) then start timer; } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y }

Fast retransmit algorithm:

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-92

TCP ACK generation [RFC 1122, RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed

Arrival of in-order segment withexpected seq #. One other segment has ACK pending

Arrival of higher-than-expectedsequence number (gap)

TCP Receiver action

Delayed ACK. Wait 200ms (up to 500ms)for next segment. If no next segment,send ACK

Immediately send single cumulative ACK, ACKing both in-order segments

Immediately send duplicate ACK, indicating seq. # of next expected byte; usually, buffer data

Transport Layer 3-95

Pipelining in TCP: Summary Cumulative & Delayed Acknowledgements

Ack with seq # n all bytes up to n received Send Ack only after receiving 2 MSS or 200msec

Recipient usually buffers out-of-order bytes As long as they are inside the window Optional selective-ack mechanism [RFC2018; we

ignore] Sender retransmits

On timeout, or `fast retransmit` (3 dup ack) Usually: resend only oldest (first) unACKed

packet

Transport Layer 3-96

Rest of Transport Layer…

TCP Overview TCP Reliability TCP Flow Control 3.6 Principles of congestion control TCP Timeouts 3.7 TCP congestion control TCP Fairness

Transport Layer 3-97

TCP Header: RcvWindow

RcvWindow: TCP’s flow

control

Maximal number of un-acked segments peer can send

Source Port Destination Port

Sequence Number

Acknowledge NumberHdr Len

RSV

Flags RcvWindow

Checksum Urgent pointer

Options (variable length)

Payload(Application Data)

Transport Layer 3-98

Problem: application on receiver may be slower than network no place for incoming data

Flow Control: prevent receiver buffer overflow By limiting amount of (unacked) segments in transit Send: RcvWindow = (|buffer|-|used|)/MSS MSS (Max Segment Size): indicated by initiator at

SYN

RcvWindow (bytes)

TCP Flow Control

Receiver BufferReceiver Buffer Read byapplication

ReceivedFrom Net

usedIgnore segments

received out of order

Transport Layer 3-99

Window Size (1) Window size is critical for performance

If RcvWindow is too small, performance suffers Even if network is available, application clears

buffers

Example: consider simple network below Client and server connected by two routers W = Window size (in bits) R = Transmission rate of all links RTT = Round Trip Time (RTT >> MSS/R)

Transport Layer 3-100

Window Size (2)

W/R

RTT

W/R

Ack

Transport Layer 3-101

W/R

RTT

W/R

Ack

Average Rate

Average Rate ≤ [R*(W/R)+0*(RTT-W/R)]/RTT= W/RTT

Average Rate ≤ [R*(W/R)+0*(RTT-W/R)]/RTT= W/RTT

Average Rate ≤ RAverage Rate ≤ R

Average Rate ≤ min(R,W/RTT)Average Rate ≤ min(R,W/RTT)

Transport Layer 3-102

Optimal Receiver Window Average Rate = min{R, W/RTT}

Flow control is not bottleneck, as long asW R * RTT

ReceiverBuffer Bandwidth x Delay If window too small: slows down connection Fat links: huge Bandwidth x Delay (e.g. satellite)

Aggressive flow control: send RcvWindow larger than available buffers By estimating drain rate; risk of buffer overflow

Works well when delay does not change much

Transport Layer 3-103

TCP Buffers - Critical Resource

TCP Receiver Buffers: critical for performance

But large buffers limit number of connections 10,000 connections, 10KB each 100 MB

Costs and performance implications

Abused by SYN-flooding DoS attack (Net. Security course)

Servers should minimize open connections Client should close connection (and wait 30 sec)

Many servers initiate close, but don’t wait 30 sec

• If Ack is lost, client may timeout (abnormal close)

Transport Layer 3-105

Lecture 7 outline

TCP Overview TCP Flow Control 3.6 Principles of congestion control TCP Timeouts 3.7 TCP congestion control TCP Fairness

Transport Layer 3-106

Principles of Congestion Control

Congestion: informally: “too many sources sending too

much data too fast for network to handle” Cf. to flow control (limited by receiver’s buffers)

manifestations: lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem! Main cause for packet loss Main cause for delay and jitter (changing delay)

Transport Layer 3-107

Congestion scenario 1: infinite buffers

two senders, two receivers one router

Thruput limit: in-rate, capacity λout=min(λin,c/2)

Assume no loss, retransmissions

Congestion: λinc/2 Infinite delays maximal

throughput What if buffers are

finite?

unlimited shared output link buffers

Host Ain : original data

Host B

out

Transport Layer 3-108

Congestion scenario 2: finite buffers one router, finite buffers λout=λin

'in≥ in since sender retransmits packet if… Packet or Ack lost due to buffer overflow (or noise) Excessive delay: if timeout<|queue| ∙ transmit time +

AckTime

c/2 ≥ 'in≥ ‘out≥out since delivery is only in order, no dups

finite shared buffers

Host Ain : original data

Host B

out

'in : data+retransmit ‘out

Transport Layer 3-109

Congestion scenario 2: finite buffers one router, finite buffers λout=λin

sender retransmits packet if… Lost due to buffer overflow (or noise) Excessive delay (> timeout)

No congestion – queue usually not full No re-transmission, loss: out=’out='in= in

Happens if in << C/2 (capacity) – for both senders…

Congestion – queues often/usually full Packet loss: whenever queue full (i.e., usually…) 'in≥ ‘out

Cannot deliver retransmission, out-of-order ‘out>out = in

• Especially with Go-Back-N ! Excessive delay (at least |Q|*TransmitTime) Overhead retransmissions due to loss, time-out

Transport Layer 3-110

Causes/costs of congestion: scenario 3 four senders multihop paths timeout/retransmit

in

Q: what happens as and increase ?

in

finite shared output link buffers

Host Ain : original data

Host B

out

'in : original data, plus retransmitted data

Transport Layer 3-111

Causes/costs of congestion: scenario 3

Another “cost” of congestion: when packet dropped, any “upstream transmission capacity used for that packet

was wasted! Causes sender to retransmit… vicious cycle!

Host A

Host B

o

u

t

Transport Layer 3-112

Approaches towards congestion control

End-end congestion control:

no explicit feedback from network

congestion inferred from end-system observed loss, delay

approach taken by TCP

Network-assisted congestion control:

routers provide feedback to end systems: To source, e.g. ICMP Source

Quench packet Or via destination, often

piggyback on message; may use flow control

From single `congestion bit` (SNA) to explicit rate (ATM ABR) – in book

Main issue: router efficiency; avoid state

Two broad approaches towards congestion control:

Transport Layer 3-113

Lecture 7 outline

TCP Overview TCP Timeouts TCP Flow Control 3.6 Principles of congestion control 3.7 TCP congestion control TCP Fairness

Transport Layer 3-114

TCP Congestion Control end-end control (no network assistance) Assume all senders comply

Protocols must be `TCP friendly`

sender limits transmission: LastByteSent-LastByteAcked

min(CongWin, RcvWin) Roughly,

CongWin is dynamic, function of perceived congestion

How does sender perceive congestion?

loss event = timeout or 3 duplicate acks

Sender reduces rate (CongWin) after loss

Two modes: `Slow start` - with

exponential build-up

Congestion avoidance (AIMD)

rate = CongWin

RTT Bytes/sec

RTT=Roun

dTripTime

Transport Layer 3-115

TCP Slow Start

Slow start: When connection begins, CongWin = 1 MSS Example: MSS = 500 bytes

& RTT = 200 msec initial rate = 20 kbps

=(8*500)/0.2

Available bandwidth may be >> MSS/RTT Need to increase congestion window (and rate) By how much? Rapid, exponential build-up Until we detect possible congestion (by loss)…

MSS=Maximal Segment Size

Transport Layer 3-116

TCP Slow Start Exponential rate build-up

until first loss event: double CongWin every RTT by incrementing CongWin

for every ACK received Summary: initial rate is

slow (CongWin = 1MSS) but ramps up rapidly

Loss: Change to Congestion Avoidance (AIMD)…

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 3-117

TCP Congestion Avoidance (AIMD: Additive Increase, Multiplicative Decrease)

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Multiplicative Decrease: cut CongWin in half after 3 duplicate Acks (of possibly lost packet) [TimeOut: see later…]

Additive Increase: increase CongWin by 1 MSS every RTT in the absence of loss events

Long-lived TCP connection

Transport Layer 3-118

Response to TimeOut vs. 3dupACKs

After 3 dup ACKs: CongWin is cut in half (multiplicative decrease) window then grows linearly (additive

increase)

After Timeout: Restart connection CongWin set to 1 MSS; Exponential build up Up to a threshold, then grows linearly

(AIMD)

• Duplicate ACK sent if segments received out of order (maybe some lost) network delivers some segments• Timeout happens if no segments get thru no connection need drastic response (restart)• Restart is like start except threshold…• What’s the threshold?

Philosophy:

Transport Layer 3-119

Restart ThresholdRestart is like slow start, except

exponential ramp-up ends at threshold (not failure), then switch to congestion avoidance (AIMD)

Idea: slow-down before reaching rate (CongWin) which (may have) caused congestion

Implementation: Variable Threshold At loss event, Threshold is set to 1/2 of

CongWin just before loss event

Transport Layer 3-122

TCP Round Trip Time and TimeoutsQ: how to set TCP

timeout value? longer than RTT

but RTT varies too short: premature

timeout unnecessary

retransmissions too long: slow

reaction to segment loss

Q: how to estimate RTT? SampleRTT: measured

time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary, want estimated RTT “smoother” average several recent

measurements, not just current SampleRTT

TimeoutInterval = EstimatedRTT + SafetyMargin

RTT=Roun

dTripTime

Transport Layer 3-123

Difficulties to `measure delay` Delayed (accumulated) Acks For retransmitted segments, can’t tell

whether acknowledgement is response to original transmission or retransmission Don’t estimate delay from retransmissions

Network conditions may change suddenly Yet don’t overreact to impact of burst traffic

Solution applicable to many situations of estimating dynamic quantities!

Transport Layer 3-124

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )*EstimatedRTT + *SampleRTT Exponential weighted moving average influence of past sample decreases exponentially fast typical value: = 0.125

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

liseco

nd

s)

SampleRTT Estimated RTT

Usually: symmetric, no queuing delay

Transport Layer 3-125

TCP Timeout: Safety MarginSetting the timeout EstimtedRTT plus “safety margin”

large variation in EstimatedRTT -> larger safety margin (usually changes in queuing delay)

First: estimate how much SampleRTT deviates from EstimatedRTT:

DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|

(typically, = 0.25)

Then set timeout interval:TimeoutInterval = EstimatedRTT + SafetyMargin

= EstimatedRTT + 4*DevRTT

Transport Layer 3-126

Summary: TCP Congestion Control Initially: no (or: infinite) Threshold

When CongWin is below Threshold, sender in slow-start, rapid accelerate phase, window grows exponentially (every Ack increments CongWin)

When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly.

When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold (multiplicative decrease).

When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS (restart)

Transport Layer 3-127

Lecture 7 outline TCP Overview TCP Timeouts TCP Flow Control 3.6 Principles of congestion control 3.7 TCP congestion control TCP Fairness

Transport Layer 3-128

Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

New (non-TCP) protocols should be `friendly, well behaved` towards TCP…

Transport Layer 3-129

Why is TCP fair?

Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally

R

R

Equal bandwidth share line

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance: additive increaseloss: decrease window by factor of 2

A

B

C

D

Total bandwidthlimit line

Transport Layer 3-130

Steady state throughput area

Whenever the total bandwidth exceeds R, one or both senders identify congestion (by receiving 3 dup acks), and reduces rate by half.

R

R

Equal bandwidth share line

Connection 1 throughputConnect

ion 2

th

roughput

Total bandwidthlimit line

Threshold line

Transport Layer 3-131

Fairness (more)

Fairness and non-TCP Multimedia apps

often do not use TCP Admission control

instead of fairness May use UDP:

pump audio/video e.g. at constant rate, tolerate packet loss

Or: new transport protocol (e.g. non-standard TCP)

Many research areas: (e.g. TCP friendly)

Fairness and parallel TCP connections

App may open many parallel connections

Web browsers do this Esp. pipelined, non-

persistent Example: link of rate R

with 9 connections; new app asks for 1 TCP

connection, gets rate R/10 new app asks for 9 TCP

connections, gets R/2 !

Transport Layer 3-132

Summary: TCP Services

Reliable Connection management Setup, teardown

Reliable connection-based communication Flow control Congestion control Fairness Efficiency improvements:

Pipelining / Window Fast retransmit Others…

Transport Layer 3-133

Chapter 3: Summary Transport layer services:

Multiplexing, demultiplexing

Reliable data transfer Flow control Congestion control Fairness

In the Internet UDP TCP

Next: Application layer:

HTTP, SMTP, DNS