comparative performance analysis of versions of tcp...

14
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 4, AUGUST 1998 485 Comparative Performance Analysis of Versions of TCP in a Local Network with a Lossy Link Anurag Kumar, Senior Member, IEEE Abstract—We use a stochastic model to study the throughput performance of various versions of transport control protocol (TCP) (Tahoe (including its older version that we call OldTahoe), Reno, and NewReno) in the presence of random losses on a wireless link in a local network. We model the cyclic evolution of TCP, each cycle starting at the epoch at which recovery starts from the losses in the previous cycle. TCP throughput is computed as the reward rate in a certain Markov renewal–reward process. Our model allows us to study the performance implications of various protocol features, such as fast retransmit and fast recovery. We show the impact of coarse timeouts. In the local network environment the key issue is to avoid a coarse timeout after a loss occurs. We show the effect of reducing the number of duplicate acknowledgements (ACK’s) for triggering a fast retransmit. A large coarse timeout granularity seriously affects the performance of TCP, and the various protocol versions differ in their ability to avoid a coarse timeout when random loss occurs; we quantify these differences. As observed in simulations by other researchers, we show that, for large packet-loss probabilities, TCP-Reno performs no better, or worse, than TCP-Tahoe. TCP-NewReno is a considerable improvement over TCP-Tahoe, and reducing the fast-retransmit threshold from three to one yields a large gain in throughput; this is similar to one of the modifications in the recent TCP-Vegas proposal. We explain some of these observations in terms of the variation of fast-recovery probabilities with packet- loss probability. Finally, we show that the results of our analysis compare well with a simulation that uses actual TCP code. Index Terms—Congestion control, mobile Internet, models for TCP, TCP performance. I. INTRODUCTION I T IS WELL KNOWN that transport control protocol (TCP) (the transport-layer protocol in the Internet) reacts to all packet losses as if they were caused by congestion, i.e., by stalling for a long timeout period and then dropping its transmission window. Thus, over a cellular wireless channel, in which there can be losses due to link errors and channel handovers, TCP can exhibit very poor performance. Several modifications have been proposed to the TCP loss-recovery and congestion-control mechanism to improve data throughput in the event of random loss; a simulation study of these has been reported in [6]. To study the performance of TCP over wireless channels, several researchers have recently used experimental testbeds to understand TCP behavior and to Manuscript received December 17, 1996; revised April 24, 1998; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor A. U. Shankar. The author was with the Wireless Information Networks Laboratory (WIN- LAB), Rutgers University, Piscataway, NJ 08854 USA, on leave from the Department of Electrical and Communication Engineering, Indian Institute of Science, Bangalore 560 094, India (e-mail: [email protected]). Publisher Item Identifier S 1063-6692(98)05642-8. propose and evaluate possible solutions; see, for example, [1], [2], [4], and [8]. Our work, reported here, belongs to a line of research that attempts to develop detailed analytical models of the TCP protocol in an effort to predict the performance of the various versions that are being proposed. Such modeling and analysis can also be used as a testbed for evaluating other variations, so that experimentation can be done for the few that are promising. The analysis effort also reveals the reasons for which the various effects are observed in the experimental work. The models we develop are analytical and parametric, so that we can quickly answer “what-if” questions or evaluate the effect of modifying a particular protocol or network parameter. In this paper we develop models and analyses for study- ing the bulk throughput of four versions of TCP, namely, OldTahoe (the original protocol from [7]), Tahoe, Reno, and NewReno [14], [6]. We assume a local network scenario, in which a host on a local area network (LAN) is transmitting bulk data to a mobile host connected to the LAN by a wireless link. Our models incorporate important aspects, such as slow start and congestion avoidance, coarse timers, fast retransmit, and fast recovery. Assuming an uncorrelated random packet- loss model on the wireless link, we obtain the data throughput as a function of the packet-error probability. Our results show how the throughput degrades, for each version, with increasing packet-loss probability. For a given loss probability, our results quantify the performance improvement provided by each ver- sion. We show that our analysis provides accurate quantitative evaluation by comparing its results with a simulation based on actual TCP code. Prior research closest to our work is that of Mishra et al. [12] and Lakshman and Madhow [11]; additional related references are also given in these papers. Mishra et al. only analyze the original protocol ([7]) and provide some supporting simulation and experimental results. They observe a cyclical structure in the TCP transmission process and they identify and analyze the Markov chain of congestion window sizes at loss instants. Both of these elements are key to our analysis as well, but we analyze the newer protocols with the fast-retransmit feature. Lakshman and Madhow consider OldTahoe and Reno, and model and analyze one or more TCP connections whose flow is constrained by a common bottleneck link. The link has a finite buffer, and the bandwidth (i.e., the bottleneck bandwidth) round-trip-delay product for each connection is large. There is, thus, the issue of buffer overflow at the bottleneck link. These authors study buffer sizing to obtain high TCP throughputs and obtain an approximation to the random loss probability so that random loss does not constrain throughput. They do not, 1063–6692/98$10.00 1998 IEEE

Upload: buikiet

Post on 27-Mar-2018

217 views

Category:

Documents


4 download

TRANSCRIPT

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 4, AUGUST 1998 485

Comparative Performance Analysis of Versions ofTCP in a Local Network with a Lossy Link

Anurag Kumar,Senior Member, IEEE

Abstract—We use a stochastic model to study the throughputperformance of various versions of transport control protocol(TCP) (Tahoe (including its older version that we call OldTahoe),Reno, and NewReno) in the presence of random losses on awireless link in a local network. We model the cyclic evolutionof TCP, each cycle starting at the epoch at which recovery startsfrom the losses in the previous cycle. TCP throughput is computedas the reward rate in a certain Markov renewal–reward process.Our model allows us to study the performance implications ofvarious protocol features, such asfast retransmitand fast recovery.We show the impact of coarse timeouts. In the local networkenvironment the key issue is to avoid a coarse timeout after a lossoccurs. We show the effect of reducing thenumber of duplicateacknowledgements(ACK’s) for triggering a fast retransmit. Alarge coarse timeout granularity seriously affects the performanceof TCP, and the various protocol versions differ in their abilityto avoid a coarse timeout when random loss occurs; we quantifythese differences. As observed in simulations by other researchers,we show that, for large packet-loss probabilities, TCP-Renoperforms no better, or worse, than TCP-Tahoe. TCP-NewReno isa considerable improvement over TCP-Tahoe, and reducing thefast-retransmit threshold from three to one yields a large gain inthroughput; this is similar to one of the modifications in the recentTCP-Vegas proposal. We explain some of these observations interms of the variation of fast-recovery probabilities with packet-loss probability. Finally, we show that the results of our analysiscompare well with a simulation that uses actual TCP code.

Index Terms—Congestion control, mobile Internet, models forTCP, TCP performance.

I. INTRODUCTION

I T IS WELL KNOWN that transport control protocol (TCP)(the transport-layer protocol in the Internet) reacts to all

packet losses as if they were caused by congestion, i.e.,by stalling for a long timeout period and then dropping itstransmission window. Thus, over a cellular wireless channel,in which there can be losses due to link errors and channelhandovers, TCP can exhibit very poor performance. Severalmodifications have been proposed to the TCP loss-recoveryand congestion-control mechanism to improve data throughputin the event of random loss; a simulation study of thesehas been reported in [6]. To study the performance of TCPover wireless channels, several researchers have recently usedexperimental testbeds to understand TCP behavior and to

Manuscript received December 17, 1996; revised April 24, 1998; approvedby IEEE/ACM TRANSACTIONS ON NETWORKING Editor A. U. Shankar.

The author was with the Wireless Information Networks Laboratory (WIN-LAB), Rutgers University, Piscataway, NJ 08854 USA, on leave from theDepartment of Electrical and Communication Engineering, Indian Institute ofScience, Bangalore 560 094, India (e-mail: [email protected]).

Publisher Item Identifier S 1063-6692(98)05642-8.

propose and evaluate possible solutions; see, for example, [1],[2], [4], and [8].

Our work, reported here, belongs to a line of research thatattempts to develop detailed analytical models of the TCPprotocol in an effort to predict the performance of the variousversions that are being proposed. Such modeling and analysiscan also be used as a testbed for evaluating other variations,so that experimentation can be done for the few that arepromising. The analysis effort also reveals the reasons forwhich the various effects are observed in the experimentalwork. The models we develop are analytical and parametric, sothat we can quickly answer “what-if” questions or evaluate theeffect of modifying a particular protocol or network parameter.

In this paper we develop models and analyses for study-ing the bulk throughput of four versions of TCP, namely,OldTahoe (the original protocol from [7]), Tahoe, Reno, andNewReno [14], [6]. We assume a local network scenario, inwhich a host on a local area network (LAN) is transmittingbulk data to a mobile host connected to the LAN by a wirelesslink. Our models incorporate important aspects, such asslowstart andcongestion avoidance, coarse timers, fast retransmit,and fast recovery. Assuming an uncorrelated random packet-loss model on the wireless link, we obtain the data throughputas a function of the packet-error probability. Our results showhow the throughput degrades, for each version, with increasingpacket-loss probability. For a given loss probability, our resultsquantify the performance improvement provided by each ver-sion. We show that our analysis provides accurate quantitativeevaluation by comparing its results with a simulation basedon actual TCP code.

Prior research closest to our work is that of Mishraet al. [12]and Lakshman and Madhow [11]; additional related referencesare also given in these papers. Mishraet al. only analyze theoriginal protocol ([7]) and provide some supporting simulationand experimental results. They observe a cyclical structure inthe TCP transmission process and they identify and analyzethe Markov chain of congestion window sizes at loss instants.Both of these elements are key to our analysis as well, but weanalyze the newer protocols with the fast-retransmit feature.Lakshman and Madhow consider OldTahoe and Reno, andmodel and analyze one or more TCP connections whose flowis constrained by a common bottleneck link. The link has afinite buffer, and the bandwidth (i.e., the bottleneck bandwidth)round-trip-delay product for each connection is large. There is,thus, the issue of buffer overflow at the bottleneck link. Theseauthors study buffer sizing to obtain high TCP throughputsand obtain an approximation to the random loss probability sothat random loss does not constrain throughput. They do not,

1063–6692/98$10.00 1998 IEEE

486 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 4, AUGUST 1998

Fig. 1. A LAN host with a TCP connection to a mobile host.

however, include coarse timeouts in their analysis; avoidanceof coarse timeouts is, however, the key issue in the localnetwork environment, and the issue of keeping the round-trip“pipe” full is not as significant as in the wide-area situation.Thus, they do not quantify the benefit of the fast-retransmitfeature in reducing the frequency of coarse timeouts.

Our work has been motivated by the experimental workreported in [1] and [2]. These papers study various modifica-tions to the TCP protocol and report experimental results onthe improvements in performance observed when TCP is usedin situations such as that shown in Fig. 1. In [6] the authorspresent fragments of simulation sample paths that demonstratehow the various TCP versions differ in the ways they recoverfrom random packet losses. Our stochastic model captures theprobabilities of the various events occurring and, hence, allowsus to quantify how these different responses to loss eventsaffect the long-run throughput of the protocols.

As in [12] and [11], our analysis is also based on a studyof the cyclical evolution of the TCP transmission process.Working with a Markov model, we are able to analyze thecongestion window process as well as the duration of thecycle times. Thus, we can compute the long-run averagebulk throughput of the protocol. The newer versions of TCP,namely, Tahoe, Reno, and NewReno, attempt to avoid coarsetimeouts by trying to do a fast retransmit and fast recovery.Working with the complete distribution of the congestionwindow process at loss instants allows us to obtain theprobability that fast recovery succeeds. Unlike [11], our firstattempt is to analyze the model as exactly as possible; since ourassumptions allow us to work with Markov models throughout,we are able to capture many more details and the analysis isrelatively straightforward. Our model includes the duplicateacknowledgment (ACK) threshold used by the fast-retransmitfeature, and we can study the effect of reducing this threshold.A threshold of one is similar to one of the modifications toReno proposed in the TCP-Vegas proposals (see [3]).

This paper is organized as follows. In Section II we brieflydescribe the data transfer part of the TCP protocol, high-lighting the various aspects that are important to model.In Sections III and IV we describe the network scenario,and the modeling assumptions/simplifications. In Section Vthe detailed mathematical analysis is presented. Section VIcontains the numerical results and their discussion. Thoseprimarily interested in the results can skip Section V; thenotation and terminology used in Section VI are definedbefore Section V. Finally, some conclusions are summarizedin Section VII.

II. THE TCP PROTOCOL

We model only the data transfer part of TCP. Details of theTCP protocol can be found in the various Internet requestsfor comments (RFC’s); see also [14]. The versions of theTCP protocol that we model and analyze all assume the samereceiver process.

The TCP receiver: The TCP receiver accepts packets outof sequence number order, buffers them in a TCP buffer, anddelivers them to its TCP user in sequence. Since the receiverhas a finite resequencing buffer, it advertises a maximumwindow size at connection setup time, and the trans-mitter ensures that there is never more than this amount ofunacknowledged data outstanding. We assume that the userapplication at the TCP receiver can accept packets as soon asthe receiver can offer them in sequence and, hence, the receiverbuffer constraint is always just . The receiver returns anACK for every good packet that it receives. An ACK packetthat acknowledges the first receipt of an error-free in-sequencepacket will be called afirst ACK. The ACK’s arecumulative,i.e., an ACK carrying the sequence numberacknowledgesall data up to, and including, the sequence number . Ifthere is data in the resequencing buffer, the ACK’s from thereceiver will carry thenext expectedpacket number, which isthe first among the packets required to complete the sequenceof packets in the sequencing buffer. Thus, if a packet is lost(after a long sequence of good packets), then the transmitterkeeps getting ACK’s with the sequence number of the firstpacket lost, if some packets transmitted after the lost packetdo succeed in reaching the receiver. These are called duplicateACK’s.

The TCP transmitter: At all times , the transmittermaintains the following variables for each connection:

the lower window edge. All data numbered up toand including has been transmitted andACKed. is nondecreasing; the receipt of anACK with sequence number causesto jump to .the congestion window. The transmitter can sendpackets with the sequence numbers

. ; increases ordecreases as described below.the slow-start threshold. controls the incre-ments in as described below.

Retransmission timeout: The transmitter measures theround-trip times (RTT’s) ofsomeof the packets for whichit has transmitted and received ACK’s. These measurements

KUMAR: PERFORMANCE ANALYSIS OF VERSIONS OF TCP 487

are used to obtain a running estimate of the packet RTT onthe connection. Each time a new packet is transmitted, thetransmitter starts a timeout timer andresetsthe already runningtimeout timer, if any; i.e., there is a timeout only for thelast transmitted packet. The timer is set for aretransmissiontimeout (RTO)value that is derived from the RTT estimationprocedure. The TCP transmitter process measures time andsets timeouts only in multiples of atimer granularity; forexample, BSD-based systems have a timer granularity of 500ms. Further, there is a minimum timeout duration in mostimplementations, e.g., two timer ticks in BSD, implying anaverage minimum timeout of 750 ms. We will see in theanalysis thatcoarse timershave a significant impact on TCPperformance. For details on RTT estimation and the setting ofRTO values, see [5] or [14].

Window adaptation and timeout-based recovery: Thebasic algorithm is common to all TCP versions and wasoriginally developed and proposed by Van Jacobson [7]; ourdescription follows that of [11]. The normal evolution of theprocesses , , and is triggered by first ACK’s(see definition above) and timeouts as follows.

1) Slow start: If , each first ACK causesto be incremented by one.

2) Congestion Avoidance:If , each firstACK causes to be incremented by .

3) Timeout at epoch causes to be set to one,to be set to , and retransmissions

to begin from .

Packet loss recovery:If a packet is lost, andwill continue to be incremented until the first ACK for thepacket just before the lost packet is received. For a particularloss instance, let their final values be denoted byand

, respectively; we will call a loss window. Then thetransmitter will continue to send packets up to the sequencenumber . If some of the packets sent after the lostpacket get through, they will result in duplicate ACK’s, allcarrying the sequence number. The last packet transmitted(i.e., ) will have a RTO associated with it.

The TCP versions differ in the way they recover from loss.We provide some details here; in Section IV we will describehow we have modeled these recovery procedures.

TCP-OldTahoe (timeout recovery [7]): The transmitter con-tinues sending until packet number and then waitsfor a coarse timeout.

TCP-Tahoe (fast retransmit [6]): There is a transmitterparameter , a small positive integer; typically . Ifthe transmitter receives the th duplicate ACK at time(before the timer expires), then the transmitter behavesas if atimeout has occurred and begins retransmission, withand as given by the basic algorithm.

TCP-Reno (fast retransmit, fast (but conservative) recovery[14] ): Fast-retransmit is implemented, as in TCP-Tahoe, butthe subsequent recovery phase is different. Suppose thethduplicate ACK is received at the epoch. Loss recoverythen starts. Recalling the definitions of and above, the

transmitter sets

The addition of takes care of the fact that more packetshave successfully left the network. The Reno transmitter thenretransmitsonly the packet with sequence number, the Renophilosophy being conservative—if only one packet is lost thenthis retransmission will produce an ACK for all of the otherpackets, whereas if more packets are lost we had better besure that we are not really experiencing congestion loss. Forthe th duplicate ACK received, at say , until recoverycompletes

We explain the rest of the Reno recovery via an example,where we take . Suppose andpacket 15 is lost. The transmitter continues sending packets16, 17, 18, , 30; suppose packet 20 is also lost. Thereceiver returns ACK’s (all asking for packet 15) for packets16, 17, 18, 19, 21, , 29, 30. Note that the ACK for packet14 would have been thefirst ACK asking for packet 15.When the ACK for packet 18 is received (i.e., the thirdduplicate ACK), the transmitter sets ,and . is still 15; thus, packet 15 isretransmitted. Meanwhile, ACK’s for packets 19, 21, ,30 are also received and grows to . Since

, with , the transmitter is allowed to sendpackets 15–36; hence, retransmission of packet 15 is followedby transmission of packets 31–36. Receipt of packet 15 resultsin a first ACK asking for packet 201 (thus first-ACKing packets15, 16, 17, 18, 19) and jumps to 20. If packets 31–36succeed in getting through, then three duplicate ACK’s askingfor packet 20 also are obtained, and 20 is retransmitted. Thisresults in a first ACK that covers all of the packets up to packetnumber 36. At this time , the congestion window is reset asfollows and a new transmission cycle starts:

Thus, Reno slowly recovers the lost packets and there isa chance that, owing to insufficient duplicate ACK’s, therecovery stalls and a timeout has to be waited for. After atimeout, the basic timeout recovery algorithm is applied.

TCP-NewReno (fast retransmit, fast recovery [6]): Whenduplicate ACK’s are received, the first lost packet is resent,

but, unlike Reno, upon receipt of a partial ACK after thefirst retransmission, the next lost packet (as indicated by thepartial ACK number) is retransmitted. Thus, after waitingfor the first duplicate ACK’s, the remaining lost packetsare recovered in as many RTT’s. If less than duplicateACK’s are received, then a timeout is inevitable. Consider thefollowing example to see the difference with Reno. Supposethat after a loss, and , packets 7, 8, , 14 aresent, and packets 7, 8, and 11 are lost. The transmitter receivesthree duplicate ACK’s for packets 9, 10, and 12 (asking for

1This is called apartial ACK since all of the packets transmitted in theoriginal loss window were not ACKed.

488 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 4, AUGUST 1998

Fig. 2. Model for the TCP connection.X(t) is the number of TCP packets queued at the IS at timet.

packet 7). Fast retransmit is done (the same as in Reno), i.e.,, and packet 7 is sent. The two ACK’s

for packets 13 and 14 cause to become 9. Assuming thatpacket 7 now succeeds, its ACK (the first ACK asking for8) would cause to become 8; the transmitter can now sendpackets 15 and 16 also. NewReno would now resend packet8, whereas Reno would wait for three duplicate ACK’s; thesecannot come since only two more packets have been sent afterthe retransmission of packet 7. Thus, in case of multiple losses,Reno has a higher probability of resorting to a coarse timeout.

Spurious retransmissions:Consider TCP-OldTahoe. Theretransmission of the first lost packet may result in an ACKthat acknowledges all of the packets until the next packet lostin the loss window. This would cause the lower window edge

to advance; the congestion window would be increasedto two and the next lost packet and its successor would betransmitted,even if this successor packet had gotten throughin its first transmission. Thus, some of the good packets in theloss window may be retransmitted when retransmission starts;this phenomenon can be seen in the sample path fragmentsshown in [6].

III. N ETWORK SCENARIO AND MODEL

A. The Scenario

The scenario (Fig. 1) is motivated by the many recentexperimental studies of TCP performance over wireless mobilelinks; see, for example, [1], [2], [4], and [8]. In this paper wedo not consider the additional issue of mobility (see, e.g., [4]);hence, we refer to the wireless link as simply a “lossy” link.

B. The Network Model

Fig. 2 shows the schematic of a model for the system shownin Fig. 1. We model only one direction of flow of packetsfromthe LAN host to the mobile terminal. Since this is a localnetworking situation,propagation delays are assumed to benegligible. The transmission time of a TCP packet from theLAN host (resp., the lossy link) is assumed to be exponentiallydistributed with mean (resp., ); we take , i.e.,time is normalized to the mean packet transmission time onthe lossy link. During a bulk transfer over a TCP connection,we would expect that the packets would predominantly be of

a fixed length. Note, however, that a multiple-access protocoloperates on the LAN as well as on the lossy link (in particular,for code-division multiple access (CDMA) or cellular digitalpacket data (CDPD) (see [13, p. 455 and 519]). Thus, therandomness in packet transmission times in our model canbe taken to account for the variability in the time taken totransmit a head-of-the-line packet at the transmitter queues.The service time at the source host model can also be used tomodel the protocol processing time at the TCP transmitter. Theexponential assumption yields Markov processes and, hence,facilitates the analysis. The exponential assumption is usedonly for the computation of certain mean cycle times; wehave found (see [9]) that fairly crude approximations for thesesuffice.

The queue at the intermediate system (IS) is the queue ofpackets waiting to be transmitted onto the lossy link. Since weare primarily interested in the effect of random losses in thewireless link, we do not study buffer overflows at the IS, i.e.,we assume that the buffer at the IS is large enough to holdthe largest possible window worth of packets. We assume thatACK’s from the mobile receiver arrive instantaneously to theLAN host. Since ACK packets are relatively much smaller thandata packets (40-byte ACK’s versus 560–1500-byte packets),this is a reasonable assumption.

Packet loss model:With probability a packet transmittedby the IS transmitter is lost; the losses are independent, i.e.,we have a Bernoulli loss model. Since ACK packets aremuch smaller (40 bytes), they have a much smaller lossprobability and we ignore ACK loss in this analysis. Typically,wireless links are subject to fading phenomena, which resultsin burst losses, rather than the uncorrelated losses that wehave assumed here. This important extension to the presentanalysis has been done in [10].

IV. M ODELING ASSUMPTIONS FOR THETCP VERSIONS

In order to obtain analytically tractable models, we havemade some simplifications that we now discuss. It is importantto note that these simplifications do not alter the essentialnature of the protocols. We obtain parametrized analyticallytractable models that can quickly yield quantitative compar-isons between the various protocol versions and can be usedfor evaluating the effects of the various parameters.

KUMAR: PERFORMANCE ANALYSIS OF VERSIONS OF TCP 491

probability of fast retransmission. We analyze this Markovchain using the exact transition probabilities and do notassume the probabilistic congestion window increment modelmentioned in Section III-B. This approximate model is usedonly in the computation of the mean cycle times (see SectionV-A-2), for which it turns out (see [9]) that fairly crudeapproximations suffice.

1) Analysis of the Markov Chain : We first obtain thetransition probabilities for the Markov chain . Let

; we wish to determinefor . Define, for

the probability that, when loss occurs in a cycle, ata congestion window of , the next cycle directlystarts in the congestion avoidance phase.

Thus, is the probability that the next cycle begins in theslow-start phase.The various TCP versions are distinguishedby the values of .

We can easily obtain the following transition probabilitydistribution for , given that .Define , and . Note that “w.p.” denotes“with probability.” Given that (and ),and recalling the window increment behavior of TCP (startingwith a window of one, in slow start, packets need tosucceed for the window to reach; if the firstpackets succeed, and not all of the nextpackets succeed,

if the first packets succeed, andnot all of the next packets succeed, etc.), we have

for

for

(2)where

Observe that the transition probabilities defined above arethe “mixture” of two other transition probabilitiesand on the finite state space .and are given as follows (we again denote byfor ):

forfor

forforfor

for

Note that corresponds to packet loss followed by slowstart, and corresponds to packet loss followed by congestionavoidance. Finally, we define the vector

and then the transition probability matrix of is written as

where is the identity matrix.The matrices and are common to our models of all of

the TCP versions with the same values of and packeterror probability . For the various TCP versions, we calculate

as shown below.TCP-OldTahoe and TCP-Tahoe:If packet loss occurs at a

congestion window equal to , then the slow-start thresholdis set to and the congestion window is set to one.Thus, we have for

(3)

i.e., each cycle always begins in slow start.TCP-Reno: In our model of TCP-Reno suppose that the

first packet loss in a cycle occurs at the congestion window. Following this packet loss, the TCP transmitter does a fast

retransmit only if it receives duplicate ACK’s. Clearly, thisis impossible if , since the ACK’s must comeout of the packets that are transmitted following thelost packet. Hence, for

(4)

Next, we suppose that , and of these packetsdo manage to get through to the receiver, thus generating

duplicate ACK’s. The TCP-Reno transmitter retransmits,however, just the first packet that the receiver is missing, andsets and . If this isthe only lost packet in the window, then upon receipt of thispacket, the receiver will ACK all of the packets sent so far bythe transmitter, and fast recovery would succeed. If there aremultiple losses, however, then the receiver will only send anACK bearing the number of the next lost packet in the window;in this case coarse timeout will be avoided only if at leastduplicate ACK’s arrive forthe secondlost packet too. Since wehave assumed a relatively fast TCP transmitter, after the firstloss occurs and duplicate ACK’s have been received, butbefore the lost packet has been retransmitted, with a probabilityclose to one, the remaining packets in the windowwould already have been transmitted by the TCP transmitter.Thus, when the lost packet is retransmitted, duplicate ACK’sfor the next lost packet in the original window can only begenerated if at least more packet transmissions follow theretransmission of the first lost packet.

Recall that after the duplicate ACK’s have been receivedat the TCP transmitter, the congestion window .Now for each additional duplicate ACK received (after these

), is further incremented by one. Let the number ofgood packets transmitted (in the orginal window) after the firstlost packet be denoted by. Then, after the TCP transmitterretransmits the first lost packet, it can send at leastmore

KUMAR: PERFORMANCE ANALYSIS OF VERSIONS OF TCP 493

Fig. 5. Throughput of versions of TCP versus packet-loss probability; (Wmax = 6, � = 5�, K is the fast-retransmit threshold).

. It follows that

VI. NUMERICAL RESULTS AND DISCUSSION

Figs. 5–7 show calculations from (1), using the detailedanalyses developed above. Recall that all times are normalizedto the mean packet transmission times at the lossy link. Inthese figures timeout granularity is 70, minimum timeout is100, and the fast retransmit threshold ; Fig. 5has , Fig. 6 has , andFig. 7 has . Let us first put these numbers inperspective. With reference to the scenario in Fig. 1, considera wireless link of effective bit-rate 1.5 Mb/s; with 1400-byteTCP segments, this implies a packet transmission time of 7.5ms on the wireless link. Suppose that the LAN host, which ison a 10-Mb/s Ethernet, can source data at the rate of 7.5 Mb/s(five times the wireless link rate); the reduction in rate beingdue to TCP processing. Assuming the BSD implementation ofTCP, the timeout granularity is 500 ms; i.e., about 70 timesthe mean packet service time. The minimum timeout is two“ticks,” which implies an average of 750 ms, or 100 times themean packet transmission time on the lossy link. With 1400-B packets, of 6, 24, and 48 are equivalent to receiverbuffer sizes of about 8, 32, and 64 kB.

In these figures throughput is plotted against packet-lossprobability. Since time is normalized to the service time onthe lossy link, a throughput of one means that the lossy linkis always occupied. In each figure we plot an upper boundthat is an ideal throughput, not practically achievable by adistributed protocol. This is just the throughput of the closedqueueing network with customers, and two exponentialservers, at the ratesand , multiplied by the probability thata packet is good, i.e., ( ) is the losslessthroughput of the network with packets circulating in it, andis given by if ,and ). The bound represents the situationin which packets circulate in the network at the maximumpossible rate; the only overhead being that, on an average,

packets are transmitted to send one good packet.The numerical results in these figures are qualitatively

similar. Throughput degrades with increasing loss probability,but different versions are affected to different extents bythe increasing loss probability. TCP-OldTahoe is the worstperforming protocol, as it always suffers a coarse timeout ineach transmission cycle. The fast retransmit feature of all ofthe other versions reduces the probability of coarse timeout todifferent extents.

There is a precipitous drop in TCP-Tahoe, Reno, andNewReno performance for loss probabilities exceeding 10;for these probabilities fast retransmit does not succeed veryoften and when it fails, the entire coarse timeout granularityhas to be waited for. Observe that, as expected, for large,all TCP versions have identically poor performance.

494 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 4, AUGUST 1998

Fig. 6. Throughput of versions of TCP versus packet-loss probability; (Wmax = 24, � = 5�, K is the fast-retransmit threshold).

Fig. 7. Throughput of versions of TCP versus packet-loss probability (Wmax = 48, � = 5�, K is the fast-retransmit threshold).

496 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 4, AUGUST 1998

Fig. 10. Throughput of TCP-OldTahoe and TCP-Reno versus packet-loss probability; comparison of simulation and analysis;(Wmax = 24; � = 100�).

VII. CONCLUSION

In this paper we have studied models for TCP in a localnetworking scenario, i.e., propagation delays are negligible.One of the key issues that our analysis captures is the impactof coarse timeout values, and we are able to quantify theability of the fast-retransmit feature to avoid coarse timeouts.We summarize here the main observations from our analysisand the numerical results (the actual numbers are for theparameters we have used in Section VI).

With a minimum timeout of 100 times, the mean servicetime at the lossy link, OldTahoe performs relatively poorlyeven with loss probabilities as low as .001, its throughputdropping to 0.5 of the lossy link service rate when the lossprobability increases to .01. In [11] the authors also reportsevere degradation of OldTahoe performance with increasingrandom loss probability. The performance reported by them ismuch worse since they model a large round-trip propagationdelay (100 times the bottleneck service time in their numericalresults); hence, each time there is a loss, the window hasto be built up over several multiples of the RTT until thenext loss occurs. At a loss probability of .01, this yields anOldTahoe throughput of about 10% of bottleneck throughput,as compared to 0.5 for our local network scenario.

The fast-retransmit feature in Tahoe, Reno, and NewRenoimproves their performance significantly up to a loss prob-ability of .01, but there is a precipitous drop beyond thispoint. Coarse timers, and a large minimum timeout, is themain reason for this drop in throughput. Yet the throughput is

more than twice that of OldTahoe, up to an error probabilityof about .15, beyond which all versions provide worse than10% throughput. Since Reno is pessimistic in doing fastretransmission and, in case of multiple losses, needs additionalduplicate ACK’s for continuing fast recovery, its throughputsuffers for large packet-loss probabilities; for it is nobetter than Tahoe (with fast retransmit). These results of ourscorroborate the observations made by Fall and Floyd [6].

Setting the duplicate ACK threshold for fast retransmissionto one can be expected to double the throughput of the basicfast-retransmit protocols for the error probability range (.05,.2). This is related to one of the proposals in TCP-Vegas(see [3]).

In work reported in detail in [9] we have found that, forthe purpose of computing the throughput of the protocolswith the fast-retransmit feature, it is important to capturethe fast-retransmit probability well and it is sufficient touse crude approximations for the mean cycle times. For thefast-retransmission probability, we found it necessary to usethe complete distribution of the congestion window at lossinstants, whereas for the mean cycle times, an approximationbased on a simple fixed-point idea (also reported in [11])worked well.

The analysis reported in this paper assumes a Bernoullipacket-loss model (packet losses are independent with somegiven probability ). We have extended this analysis to fadingwireless channels by using a two-state Markov model forpacket errors; this work is reported in [10].