author's personal copy - university of california,...

17
This article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author’s institution, sharing with colleagues and providing to institution administration. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Upload: others

Post on 12-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

This article was published in an Elsevier journal. The attached copyis furnished to the author for non-commercial research and

education use, including for instruction at the author’s institution,sharing with colleagues and providing to institution administration.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Page 2: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py

On the analysis of overlay failure detection and recovery

Zhi Li a, Lihua Yuan b,*, Prasant Mohapatra c, Chen-Nee Chuah b

a Network Systems Engineering, AT&T, United Statesb Electrical and Computer Engineering, University of California, Davis, 2064 Kemper Hall, CA 95616-5294, United States

c Computer Science, University of California, Davis, CA, United States

Received 23 October 2006; received in revised form 22 March 2007; accepted 5 April 2007Available online 25 April 2007

Responsible Editor: I.F. Akyilidiz

Abstract

Application-layer overlay networks have been proposed as an alternative method to overcome IP-layer path anomaliesand provide users with improved routing services. Running at the application layer, overlay networks usually rely on prob-ing mechanisms for IP-path performance monitoring and failure detection. Their service performance is jointly determinedby their topology, parameters of probing mechanism and failure restoration methods. In this paper, we first define metricsto evaluate the performance of overlay networks in terms of failure detection and recovery, network stability and over-head. Second, we model the overlay-based failure detection and recovery process. Through extensive simulations, we inves-tigate how different IP-layer path failure characteristics and overlay topologies, detection and restoration parameters affectservice performance of overlay networks. In particular, we examine the tradeoffs among different overlay performance met-rics and the optimal performance conditions. Our study helps to understand overlay-based failure recovery and providespractical guidance to overlay network designers and administrators.Published by Elsevier B.V.

Keywords: Overlay networks; Failure recovery; Recovery performance; Overlay design

1. Introduction

The Internet suffers from various failures andanomalies, such as optical fiber cuts [1], maliciousattacks, and BGP misconfigurations [2]. Previousmeasurement results show that a significant amount

of routing pathologies prevent pairs of hosts fromconnectivity 1.5% to 3.3% of the time [3], and pathavailabilities range from 99.6% for servers to 94.4%for broadband hosts [4]. Some routing path failures,such as intra-domain routing failures, can recoverwithin milliseconds or seconds. However, inter-domain routing anomalies e.g., BGP failures, maytake up to 30 min to recover [3]. Physical-layer fail-ures, e.g., fiber cuts, may require days or even weeksto recover.

To provide reliable services to the applicationlayer, each lower layer of the protocol stack, such

1389-1286/$ - see front matter Published by Elsevier B.V.

doi:10.1016/j.comnet.2007.04.007

* Corresponding author. Tel.: +1 530 754 5385; fax: +1 530 7528428.

E-mail addresses: [email protected] (Z. Li), [email protected] (L. Yuan), [email protected] (P. Mohapatra),[email protected] (C.-N. Chuah).

Computer Networks 51 (2007) 3828–3843

www.elsevier.com/locate/comnet

Page 3: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py

as the physical layer (optical network) and theMPLS/IP layer, has its own failure detection, recov-ery and fault tolerance mechanism. When a failurehappens, multiple protocol layers may detect itand independently adopt their own mechanisms tobypass or recover from the failure. In general,lower-layer mechanisms react faster but higher-layermechanisms offer a finer recovery service in meetingthe requirements of different users and applications.All these failure resiliency mechanisms aims toreduce user-perceived end-to-end path failures.

Recently, application-layer overlay networks [5]have been proposed to quickly detect and recoverfrom lower-layer anomalies and improve user-per-ceived network resilience. Running at applicationlayer, overlay nodes can receive and forward trafficfor other nodes. Overlay nodes can send probingpackets to each other to measure the performanceof underlying path and detect the failure of underly-ing networks. When a lower-layer failure isdetected, overlay nodes can re-route traffic via analternative overlay-layer path (through other over-lay nodes) and circumvent the failure. Experimentsshow that overlay networks can effectively over-come lower-layer anomalies and improve user-per-ceived network resilience [5,6].

Past research on overlay network can be broadlyclassified into two broad categories – (1) applicationoverlays e.g., CAN [7], Chord [8] and Gnutella [9],which are formed by dynamic end-nodes joiningand leaving the network at will, and (2) infrastructureoverlays like RON [5], which has dedicated nodescommitted over a prolonged period of time. Conse-quently, node ‘‘failures’’, which are actually manifes-tation of node membership changes, dominate theobserved failures in applications overlays. In con-trast, failures in infrastructure overlays are domi-nated by real link failures, routing anomalies andmanifestation of transient failures like congestion.This paper focuses on infrastructure overlays fortheir capabilities to offer failure detection and recov-ery as a service to end-users. In addition, large-scaletestbeds like PlanetLab [10] have recruited many ded-icated server-class nodes around the world, makingbuilding large-scale infrastructure overlay networka realistic goal to many. In Section 2, we will discussin detail the system model and failure recovery mech-anisms used by infrastructure overlays.

Although there are a lot of work on overlay net-works, not much has been done on characterizingthe overlay network failure detection and recoverymechanism. This paper aims to fill this void through

the following. First, we propose a comprehensive setof metrics, including reduction of failure duration,system overhead and network stability, for compar-ing the performance of overlay networks in terms ofimproving network resiliency. Second, infrastructureoverlay networks have several key tunable parame-ters in the overlay failure detection and recovery pro-cess. For example, the frequency of probing presentsa fundamental tradeoff between network overheadand the time required to detect and recover a failure.A lower probing frequency incurs less overhead butmight require a longer time to detect and recover afailure. A larger probing frequency incurs moreoverhead but might allow the overlay network tobe more responsive to the failures. Making thingsmore complicated is the potential interactionbetween overlay failure recovery mechanism andthe failure characteristics at lower layers and theirown recovery mechanisms. For example, traffic con-gestion and transient network failures could producemany unnecessary overlay failure recovery eventsand may lead to network instability. We present amathematical model to study the tradeoffs involvingthese tunable parameters and captures these compli-cated interactions. Third, compared to physical orIP networks, overlay networks have great flexibilityin choosing their topologies since their links are log-ical. Our model can be extended to study the impactof topologies on the performance of failure recoveryas well. The major contributions of this paper are:

• We establish a set of meaningful metrics to mea-sure the performance of overlay failure detectionand recovery and present a mathematical modelto analyze them.

• We show that while it is meaningful to build ahighly connected overlay for failure detection, itis counterproductive to use it for failure recovery.Based on this observation, we propose to use adifferent topology with smaller node degree forfailure recovery.

• We study how the settings of different parameterswould affect the performance of an overlay net-work. Based on this study, we provide meaning-ful advice to overlay network designers andadministrators.

The rest of the paper is organized as follows. Wesummarize the related work in Section 5. In Section2, we review the overlay failure detection and recov-ery mechanisms and define metrics to evaluate over-lay network service performance. In Section 3, we

Z. Li et al. / Computer Networks 51 (2007) 3828–3843 3829

Page 4: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py

characterize and analyze the failure detection andrecovery mechanisms. We present simulation stud-ies and performance analysis in Section 4 and con-clude in Section 6.

2. System model

The notations used in the paper are summarizedin Table 1.

2.1. Overlay link failure detection method

Earlier study on a tier-one ISP shows that single-link failures dominates all failures observed [1]. Inthis paper, we assume that IP-link failures are inde-pendent of each other and they are uniformly distrib-uted over all IP-layer links. When an IP-layer linkfails, a limited ratio (QIf) of paths that pass throughthe failed IP-layer link will experience forwarding dis-ruption, of which the duration is determined by theIP-layer failure restoration mechanisms, timer setupand topologies [1]. This duration of forwarding dis-ruption is denoted as tf. The distribution of IP-layerpath failure duration is modeled as PI(tf).

Overlay networks, such as RON [5] and Akamai[6], usually adopt the following method to detectoverlay link (IP-layer path) anomalies and performoverlay-layer failure recovery. Each overlay nodeperiodically sends probing packets, which includesthe performance of its adjacent overlay links toother overlay nodes every probing interval (Tp

seconds). Upon receiving a probing packet, an over-lay node replies to the sender with an ACK packet.If an ACK packet is not received within a prede-fined time (RTT or longer), the packet sender willdeem the probing packet has timed out and contin-uously send additional k probing packets at intervalTt (fast retransmission interval). If all the k packetstime out, the source node will consider this overlay

link has failed. Thus, the source node employsapplication-layer path recovery mechanism (findingan alternative overlay path) based on its knowledgeof overlay network link state information. Wedefine the time gap between detecting a failure andfinding an overlay path to overcome the failure asOverlay Failure Recovery Time (Trt), which ismainly composed of a hold-off timer to avoid therace condition between IP layer and overlays. Afteran overlay link is detected down, overlay nodes willcontinue sending probing packets every Tp secondsto check if the connectivity is back.

The value of Tt is usually fixed to a value higherthan the maximal expected round trip delay andcorrelated packet loss interval to avoid transientcongestion caused consecutive loss [11]. As the cor-related packet loss interval in the Internet is usuallyaround one second, similar to the TCP SYN time-out value, Tt is usually between 1 and 3 s. For therest of our discussion, we assume the value of Tt is3 s. Same as RON [5], we set k at three. The probinginterval (Tp) and the average number of neighborsdetermine the probing overhead of each overlaynode. With a large value of Tp, overlay nodes can-not quickly detect IP-path failures, which will resultin a large amount of packet loss. However, a smallTp will result in higher probing and routing over-head as well as path instability.

2.2. Other failure detection algorithms

There are some previous work using more com-plicated failure detection schemes to achieve fasterfailure detection. We broadly classify these schemesinto three categories.

• Peer information sharing schemes [11,12] proposefor nodes to share failure information. However,these algorithms are proposed in the context of

Table 1Notations for analysis

Notion Explanation Notion Explanation

tf IP path failure duration (duration of forwarding disruption) Tp overlay network probing intervalTt Fast retransmission time interval Trt overlay failure recovery timePI(tf) probability density function (Pdf) of path failure with duration

tf based on IP recoveryPO(tf) Pdf of path failure with duration tf with overlay

recoveryPOd(t) Pdf of an IP-path failure detected by the overlay t time after it

happensQOd(tf) Overlay failure detection ratio for IP-path

failures with duration tf

QIf ratio of affected IP paths caused by an IP-layer link failure QOr overlay failure recovery ratio (ratio of findingalternate paths)

QOd average IP failure detection ratio at the overlay layer QOrl failure recovery ratio loss compared to bestperformance

3830 Z. Li et al. / Computer Networks 51 (2007) 3828–3843

Page 5: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py

application overlay networks in which failures aredominant by nodes joining and leaving the net-work. Sharing positive or negative (peer up ordown) information [11] could be beneficial forfailure detection since other nodes should observea node failure consistently regardless of where theobserving overlay node is located. In contrast, aninfrastructure overlay with dedicated server-classnodes experience significantly less failures thanlink failures. Two overlay links may or may notshare the failed lower-layer link. Without corre-sponding lower-layer topology information, theknowledge about the failure of one overlay linkcannot help to infer status of other overlay links.

• Application information sharing schemes proposeto use application packets as probing packetsand the corresponding ACKs or NACKs [13] tomonitor the link. Such scheme requires integra-tion of overlay network protocol and specificapplications and its performance depends heavilyon the traffic profile of the application.

• Cross-layer information sharing schemes proposefor lower layers to provide overlay node accessto their information like routing and/or notifyoverlay node for any failure they detected. How-ever, such explicit support from lower layersrequires modification at lower layers, which isnot always feasible. Since overlay network is auser-level process, sharing lower-layer informa-tion that are normally maintained at kernel levelmight pose a security risk.

While each of these complex schemes has its respec-tive advantages, this paper choose to focus on thefailure detection algorithm proposed by RON forits general applicability. The failure detection algo-rithm we considered can be achieved by any overlaynodes as long as they can send application-layerpackets. It does not require any explicit supportfrom lower layers and relies on any specific trafficpattern of the applications. Studying this scenariocan give us better understanding on tradeoffs andfundamental limits of overlay network itself.

2.3. Overlay network performance metrics

2.3.1. Average failure duration reduction

An important metric for overlay networks is tothe capability to quickly detect lower-layer path fail-ures and find alternative paths. We introduce thedefinition of Average Failure Duration Reduction

(AFDR) to evaluate the performance in terms of

bypassing IP-layer path failures and providing resil-ient routing services. It is defined as

AFDR ¼Z 1

0

tfP IðtfÞdt �Z 1

0

tfP OðtfÞdt: ð1Þ

In the above equation, PO(tf) is the distribution ofpath failure duration experienced by end-users whenpassing through overlay networks. As a result,R1

0tfP OðtfÞdt is the average path failure duration

(mean time to repair, MTTR) on overlay networks.In contrast,

R10 tf P IðtfÞdt is the MTTR directly on

the IP networks.

2.3.2. Overlay service overhead

Overlay networks cannot provide service withoutany cost. As stated above, overlay networks canonly detect failures and retrieve overlay link perfor-mance by periodically sending and receiving prob-ing packets. As overlay networks usually providerouting service based on link state routing proto-cols, the total routing service overhead (O) is notonly comprised of probing traffic overhead (OP),but also the routing traffic overhead of sendingand receiving overlay link state information (OR).

2.3.3. Overlay routing service instability

When IP or lower-layer failures happen, both theIP layer and the overlay layer will perform their ownrecovery mechanisms to bypass the failures. If theIP layer recovers faster, overlay-layer recovery isunnecessary and results in undesirable path oscilla-tions. In consequence, these oscillations lead to traf-fic delay variations, out-of-order packet delivery,and reduced end-to-end throughput. To evaluatethe path stability of service paths, we use the num-ber of user path switches per IP failure, denotedNs, which includes the switches between IP pathsand overlay paths as well as the switches just amongoverlay paths.

In summary, the goal of overlay networks is toreduce the average failure duration and provide sta-ble data forwarding paths while incurring accept-able overhead.

3. Analysis of overlay failure detection and recovery

In this section, we model the failure detection andrecovery process in overlay networks. Based on thismodel, we formulate performance metrics andinvestigate how different parameters, such as thevalue of Tp and Trt, can affect different overlay net-works service performance metrics.

Z. Li et al. / Computer Networks 51 (2007) 3828–3843 3831

Page 6: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py

3.1. Failure detection

Fig. 1 depicts an example of the overlay-layerfailure detection process. In this figure, node Aprobes the overlay link connecting it to node B.Node A sends the first probing packet at time 0and receives an ACK message from node B. Sup-pose the IP-layer path between node A and nodeB fails at To. Node A will not detect the failureevent right away. At Tp, A sends the second prob-ing packet. At Tp + Tt, the timer for the secondprobing packet has expired, and A then sends anadditional k � 1 probing packets. If the IP-layerpath failure lasts longer than Tp � To + kTt, Ainfers that the overlay link has failed and tries tofind an alternative overlay path to B.

As IP-layer path failure durations are variable, ifa failure (or a transient traffic congestion) is recov-ered before A’s second probing packet times out,the anomaly cannot be detected by node A at all.The relationship between failure occurrence time(To after sending out the latest probing packet)and the minimal detectable failure duration isdescribed in Fig. 2. The X-axis is the failure occur-rence time while Y-axis is the minimum detectablefailure duration. Y-axis value can be expressed asf(x) = Tp + kTt � x. From the figure, we canobserve that if a failure occurs at time 0 after theprevious probing packet, the detectable failuresdurations should be at least Tp + kTt. However, ifa failure occurs at Tp, it could still be detectableeven though it only lasts for kTt.

Given an IP-path failure lasting for duration tf,the probability that it is detected (QOd(tf)) can beexpressed by Eq. (2). All the IP-path failures thatlast longer than Tp + kTt can definitely be detectedat the overlay layer while those last shorter thankTt will not be detected at all.

QOdðtfÞ ¼0 if tf < kT t;tf�kT t

T pif kT t 6 tf 6 T p þ kT t;

1 if tf > T p þ kT t:

8><>: ð2Þ

The average detectable failure ratio at the overlaylayer ðQOdÞ can be expressed as

QOd ¼Z 1

0

QOdðtfÞ � P IðtfÞdtf : ð3Þ

The dashed curve in Fig. 3(b) is an example dis-tribution of IP-path failure duration. Based on Eq.(2) (described in Fig. 3(a)), the corresponding distri-bution of detected IP-path failure durations can bedescribed by the solid curve in Fig. 3(b). The firstpart of the distribution curve, A–B, is determinedby the value of Tp and kTt. The value of kTt deter-mines the location of A while the value of Tp deter-mines the slope of A–B. The smaller value of kTt

and Tp will help overlay networks detect more IPpath failures.

As a failure can start at anytime (between 0 andTp) after the previous successful probe, there will besome detection delay between the failure occurrencetime and detection time. The distribution of failureswith respect to the detection delay is described inEq. (4), which reflects the minimal possible recoverydelay via overlays.

P OdðtfÞ ¼R1

tf

P IðxÞQOdðxÞT p

dx if kT t 6 tf 6 T p þ kT t;

0 else:

(

ð4Þ

3.2. Failure recovery

Overlay nodes initiate failure recovery and try tofind alternative overlay-layer paths to bypass theirdetectable failures based on their knowledge ofFig. 1. Failure detection process.

Fig. 2. Occurrence time vs. minimum detectable duration.

3832 Z. Li et al. / Computer Networks 51 (2007) 3828–3843

Page 7: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py

global overlay link state information. Similar to otherlink-state based routing protocols (such as OSPF), itis necessary that probing and routing state updateevents of each overlay node are not synchronized.However, as shown in the following analysis, thisasynchronous behavior may decrease overlay serviceperformance and path stability to some extent.

When an overlay node detects a failure on aneighboring link, it will try to re-route throughother nodes to reach its destination. The searchfor alternative routes is based on the local informa-tion about of other overlay links. Since overlaynodes do not synchronize their probing activities,some nodes will require more time to detect overlaylink failures than other nodes, even though overlaylink failures caused by the same IP-layer failureevent happen at the same time in reality. Such asyn-chronized detection may cause nodes to have incor-rect link state information, resulting in sub-optimalroute decision and negatively affect the performanceof failure recovery and network stability.

In the following, we consider a simple overlaytopology (Fig. 4) and use it to illustrate our discus-sions on the impact of such asynchronous failuredetection on the performance of overlay failurerecovery. To simplify the analysis, we ignore thepath propagation delay between overlay nodes.

Fig. 5 depicts a typical scenario of such asynchro-nous failure detection. Assume IP link F–B fails attime t and overlay node A detects the resulted fail-

ure of overlay link A–B at time Tp + kTt (0 6t 6 Tp). We denote Trt the time for A to find analternative path after detecting the failure. Anotheroverlay link C–B might also fail because it passesthrough F–B. If A does not have correct informa-tion about C–B at time Tp + kTt + Trt, it may makesuboptimal decision for failure recovery. Althoughoverlay link failures that are caused by the sameIP failure start at the same time, their failuredurations could be different since they are deter-mined by the underlying IP-layer failure recoverymechanisms,. If C–B remains failed at time Tp +kTt + Trt, i.e., the failure duration of C–B (denotedas tf) is longer than Tp � t + kTt + Trt, A shouldideally be notified in time and avoid using C–Bfor failure recovery. Note that the probability thatthe failure duration of C–B is larger than Tp � t +kTt + Trt can be found as

R1T pþkT tþT rt�t P IðxÞdx.

Since an overlay node requires at least kTt timeto confirm a link failure, C can only notify A aboutthe failure in time if C sends out one probing packetbefore Tp + Trt. Given a probing interval of Tp, thefirst probing packet from C to B after the IP-layerfailure event is uniformly distributed within theinterval of [t,t + Tp]. Therefore, at the time of mak-ing routing decision, the probability of A can cor-rectly identify C–B as failed is

T pþT rt�tT p

(if t P Trt)or 1 (if t < Trt).

Denote Qfg as the probability of false negatives –a failed overlay links is considered good for failurerecovery purposes. Qfg can be found as

Fig. 3. Distribution of detectable IP path failure duration. (a) Detection ratio and (b) failure duration.

Fig. 4. Example overlay network.

Fig. 5. Asynchronous failure detection and recovery.

Z. Li et al. / Computer Networks 51 (2007) 3828–3843 3833

Page 8: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py

QT rtfg ¼

Z T p

T rt

1

T p

1� T p þ T rt � tT p

� �

�Z 1

T p�tþkT tþT rt

P IðxÞdxdt

¼Z T p

T rt

t � T rt

T 2p

Z 1

T p�tþkT tþT rt

P IðxÞdxdt: ð5Þ

Note that if the failure duration of C–B is shorterthan Tp � t + kTt + Trt, A will not notice the fail-ure either. However, we do not consider this as falsenegatives since C–B has recovered at the time Aneeds to make routing decisions. Although A failsto know that C–B actually failed for a short periodof time, the information it has at the time of makingdecisions is accurate.

Based on above result, the probability of havingcorrect information about a failed link (QT rt

ff ), (con-sider the failed link C–B as failed), can be found as

QT rtff ¼

Z T p

0

1

T p

Z 1

T p�tþkT tþT rt

P IðxÞdxdt � QT rtfg : ð6Þ

Note that Fig. 5 and our above discussions onlyillustrate the simplest scenario of such delayed con-vergence. In reality, such staggered convergenceevents may have cascading effects and incur longtime service path instability among different overlaypaths.

As overlay nodes perform failure recovery basedon overlay link performance probing results, tran-sient failures or traffic congestion may provide over-lay nodes with incorrect information of overlay linkperformance. If failures are recovered (or conges-tion disappears) at the lower-layer before overlay-layer failure recovery finishes (or update messagesare sent), the overlay link failures should not becounted as failed when a node performs overlay-layer failure recovery. Otherwise, we call them false

positives. False positives can cause the followingundesirable effects: (1) overlay nodes send outredundant performance update messages; (2) over-lay nodes perform unnecessary failure recoverycausing routing service oscillation; (3) overlay nodeshave incorrect overlay link state information andthe probability of bypassing the failed links isreduced. This probability is described in the follow-ing equation:

QT rtgf ¼

Z T p

0

1

T p

Z T p�tþkT tþT rt

kT t

P IðxÞQOdðxÞdxdt: ð7Þ

Specifically, this is the probability of another over-lay link failures that is recovered before an overlay

node performs recovery activity for a failed overlaylink (good links are deemed as failed links).

Based on above result, for the following ratio:

QT rtgg ¼

Z T p

0

1

T p

Z T p�tþkT tþT rt

0

P IðxÞdxdt � QT rtgf ; ð8Þ

a good overlay link (such as C–B) will be consideredas a good link (correct information) by anothernode (such as node A) when it searches for an alter-nate overlay path.

Suppose the IP link failure ratio is QIf (IP pathsaffected by an underlying IP-layer failure event).Then, Qgð¼ ð1� QIf Þ þ QIf QT rt

gg Þ is the ratio of oper-ational overlay links for which an overlay node alsohas the correct information of those link perfor-mance. Let L be the average number of overlayhops overlay paths pass through (or the numberof intermediate overlay nodes). Compared to theideal case, overlay nodes can fail (or incur delay)while selecting each link for failure recovery byeither misidentifying a failed overlay link as a goodone or vice versa. Thus, the failure recovery ratioloss (QOrl) can be derived as

QOrl ¼ 1�Qg

Qg þ QIf QT rtgf

!L

�Qg

Qg þ QIf QT rtfg

!L

:

ð9Þ

As shown in [2,5], when L = 2, overlay networkscan achieve good performance by overcomingaround 50% of IP-layer path failures. In addition,in large and well-connected IP networks like theInternet, the value of QIf will not be very large.Based on these facts, we can conclude that the fail-ure recovery ratio loss will not be very large most ofthe time.

Suppose the ideal (best) failure recovery ratio ofan overlay network is Qrideal. The value will be thefailure recovery ratio if each overlay node has accu-rate global overlay link state information. However,as shown above, the value of Tp, kTt and Trt willaffect the accuracy of information observed by eachoverlay node. In Eq. (9), we have defined QOrl as theoverlay failure recovery ratio loss compared to theideal performance. The actual failure recovery ratiocan be defined as

QOr ¼ ð1� QOrlÞQrireal: ð10Þ

3834 Z. Li et al. / Computer Networks 51 (2007) 3828–3843

Page 9: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py

3.3. Impact of overlay topology

As overlay networks are built at the applicationlayer, the administrator can determine whether tobuild one overlay link between any pair of overlaynodes. We can use various topologies to connectthe overlay nodes, based on which the overlay trafficcan be forwarded to bypass the IP path failures.

Overlay overhead is composed of two parts: pathprobing overhead and routing state update over-head. Based on path probing activity, each overlaynode can detect its adjacent overlay link perfor-mance as well as IP-path failure events. This partof the active probing overhead is determined bythe overall size of the corresponding overlay net-works. We can also further reduce the overheadby passively listening to the passing overlay or IPtraffic to infer the performance information. Rout-ing state update overhead is determined by the aver-age number of adjacent overlay nodes in thecorresponding overlay topologies.

In this paper, we assume that the overlay nodesperform active probing to retrieve the overlay pathperformance information. Probing overhead (Op)can be expressed by Eq. (11):

Op /S � D

T p

; ð11Þ

where S is the size of probing packet and D is theaverage number of adjacent overlay nodes in theoverlay topology.

Routing overhead (Or) is strongly affected by thedesign and goal of the routing protocol. In RON [5],in addition to link availability, updated link perfor-mance information are sent together with everyprobing packets. The routing update overhead cantherefore be expressed as

Or /N � ðH þ P � DÞ

T p

; ð12Þ

where H is the header size of routing packet, P con-tains the information describing path to each peerand N is the size of overlay networks. If the routingprotocol only provides failure recovery, link stateadvertisement (LSA) need to be sent only whenthere is a change of link status (up or down). In suchcase, routing overhead can be significantly reduced.

As the overlay data forwarding paths are basedon top of overlay topologies, it is obvious that over-lay topologies determine its best failure recoveryratio (Qrireal, so as to the practical failure recoveryratio, QOr).

The practical value of ideal failure recovery ratioan overlay network can provide is determined by thesize of the overlay network, the underlying size ofthe IP network as well the topology of the IP net-work. To show different node degree’s impact onfailure recovery ratio, we investigate a 20-node over-lay network on top of an 100-node IP network,which is connected by a grid topology. We varythe average node degree of the overlay network. Ifthe node degree is less than 19 (full-mesh case), eachoverlay node randomly chooses overlay neighbors.The average ideal failure recovery ratio is shownin Fig. 6. The different curves show the ideal failurerecovery ratios on top of different number of con-current IP link failures. From this figure, we canobserve that the ideal failure recovery ratio is notimproved after the average node degree is abovefour or five. This means that some additional over-lay links are not helpful in terms of failure recovery.On the other side, we can see the higher node degreeis needed when there are large amount of concurrentIP link failure events.

The corresponding per-node overlay routing over-head is shown in Fig. 7. From the figure, we can seethe overlay routing overhead varies a lot under differ-ent values of average node degree, which is even moresevere if we do not include the node probing over-head (Op). Considering the higher node degree doesnot necessarily mean higher failure recovery ratio,it is possible for us to reduce overhead withoutdegrading overlay service performance.

3.4. Mean time to repair (MTTR)

Based on above derivation of QOd (failure detec-tion ratio for an IP path failure with duration as t

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14 16 18

Idea

l Fai

lure

Rec

over

y R

atio

Num. of Neighbors

1 IP Link Fails5 IP Link Fails

10 IP Link Fails25 IP Link Fails50 IP Link Fails

Fig. 6. Node degree vs. ideal failure recovery ratio.

Z. Li et al. / Computer Networks 51 (2007) 3828–3843 3835

Page 10: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py

second), POd (the probability of detecting a failure t

seconds after it happens) and QOr (failure recoveryratio), we can obtain the analytical result of pathfailure duration on top of overlays (PO(t)) as definedas

P OðtfÞ ¼P IðtfÞ if tf < kT t;

P IðtfÞð1� QOdðtfÞQOrÞþP Odðtf � T rtÞQOr if tf P kT t:

8><>:

ð13Þ

From Eq. (13), we can see that the overlay networkscannot provide help for failures that last shorter thankTt and therefore cannot be detected. For fail-ures that last longer than kTt, with probabilityPI(tf)(1 � QOd(tf)QOr) they are not recoverable. Withoverlay providing failure recovery services, the prob-ability of a failure lasts tf is POd(tf � Trt)QOr.

3.5. Average failure duration reduction (AFDR)

Based on the above results, for an overlay linkfailure with a duration of tf at the IP layer, the aver-age time between failure occurrence time and failuredetected time is

DtðtfÞ ¼R t

kT txdx if tf 6 T p þ kT t;R T pþkT t

kT txdx if tf > T p þ kT t:

(ð14Þ

AFDR can be obtained by comparing thosedetectable and recoverable failures’ IP path failuredurations and the durations on top of overlays.That is, Dt(t) plus overlay failure recovery time(mainly overlay hold-off timer). The value of AFDRcan be described by the following equation:

Z 1

0

P IðtÞQdðtÞQorðt � ðT rt þ tDtÞÞdt; ð15Þ

where Trt + Dtt is the failure duration on top ofoverlay for detectable IP-path failure.

3.6. Path stability

Overlay networks respond to IP path failures byre-routing through other overlay nodes. In additionto extra overhead, path switches also incur end-to-end delay variations, out-of-order packet deliveryand reduced throughput. Unlike the IP-layer, wheresingle link failure dominates, an overlay networkcan have multiple correlated link failures causedby a single IP failure. As illustrated in Fig. 4, a fail-ure at the IP link F–B will cause two overlay linkfailures at A–B and C–B.

Since the probing timers on different overlaynodes are not, and should not be, synchronized, cor-related overlay link failures are detected at differenttimes. Consequently, overlay nodes may haveinconsistent and inaccurate link state informationfor recovery purposes. Again, consider an IP failureat link F–B in Fig. 4, if A detects the failure ofoverlay link A–B first, it may decide to re-routethrough A–C–B. This path switch is howevermeaningless. When C detects the overlay failureon link C–B later, A need to switch again toA–D–B. The overlay link C–B, in this particularcase, only incurs additional path switches anddelays the re-convergence and the failure recoveryprocess.

The durations of correlated IP-path failuresmight also be different since the IP layer take differ-ent time to recover different IP path. In Fig. 4, theIP layer could find path A–D–E–B before it findsC–A–D–E–B. Therefore, the overlay node C, inorder to reach B, could be moving from overlaypath C–A–D–B to overlay path C–A–B to C–B.Eventually, C might return to use IP-layer connec-tivity between C and B since overlay-based routingis more expensive. During the entire process, the IP-layer path between C and B does not change. Butthe end-user will experience path oscillations.

In the following, we analyze overlay path switchesbased on Fig. 8. The curve is the IP-layer failuredistribution without using overlay networks. TheY-axis can be seen as the probability of the failureduration taking on a corresponding X-axis value.

Suppose an IP-layer failure is detected at time t(t 2 [kTt, Tp + kTt]). After a short Hold-off Timer

0

1

2

3

4

5

6

7

8

9

2 4 6 8 10 12 14 16 18

Per

Nod

e R

outin

g O

verh

ead

(Kb/

s)

Num. of Neighbors

Tp=12Tp=16Tp=18

Tp=12 (w/o Op)Tp=16 (w/o Op)Tp=18 (w/o Op)

Fig. 7. Node degree vs. overhead.

3836 Z. Li et al. / Computer Networks 51 (2007) 3828–3843

Page 11: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py(Trt to avoid the race condition between two layers),if the failure is recoverable (with probability ofQOrl), the traffic will be redirected to overlay paths.After this, the overlay node will continue to probethe original IP path performance every Tp seconds.If the IP path failure is recovered by the IP path, thetraffic will be redirected to original IP path after Tp.The number of switches within time gap Tp can beexpressed in the following equation:

2QOrl �Z kT tþT p

kT t

1

T p

Z tþT rtþT p

tþT rt

P IðxÞdxdt: ð16Þ

Similarly, at time t + Trt + 2Tp, some IP path fail-ures can also be recovered and overlay traffic willbe directed back to the original IP paths again.The additional number of switches with time gap2Tp can be expressed as

2QOrl �Z kT tþT p

kT t

1

T p

Z tþT rtþ2T p

tþT rtþT p

P IðxÞdxdt: ð17Þ

Overall, the number of switches per overlay linkfailure ðNo

s Þ can be described in the followingequation:

Nos ¼ 2QOrl

Z kT tþT p

kT t

X1n¼1

1

T p

�Z tþT rtþnT p

tþT rtþðn�1ÞT p

P IðxÞdxdt

¼ 2QOrl

Z kT tþT p

kT t

1

T p

Z 1

tþT rt

P IðxÞdxdt: ð18Þ

As we defined in Section 2, an IP failure could incurmultiple overlay failures and the ratio of affected IP-path is QIf. For a full-mesh overlay network with N

nodes, the total number of affected overlay links isN2QIf. Consequently, the number of path switchesper IP failure (Ns) is

N s ¼ N 2QIf N os : ð19Þ

4. Simulation study

In this section, we perform simulation studies tovalidate our analysis and study the tradeoffs amongdifferent overlay performance metrics. Unfortu-nately, we cannot evaluate our study on a real over-lay network like PlanetLab [10] for two reasons.First, actual failures on the Internet is highly ran-dom and experiments based on this will not berepeatable. Second, injecting controlled IP-layerfailures into the Internet is difficult and impractical.

4.1. Simulation setup

4.1.1. Network model

In our simulation, we use a two-level power-lawtopology with 1000 nodes and 4000 edges generatedby BRITE [14] with the default parameters. Thelower level is based on the Waxman model withparameters a = 0.15 and b = 0.2. The higher levelis based on the Barabasi-Albert model withoutrewiring. At the overlay layer, we randomly select50 nodes from the 1000 IP-layer nodes as overlaynodes and construct the overlay network. We usethe real Internet end-to-end delay values fromKing’s dataset [15] to model the overlay node-to-node delay. Both the IP and overlay layers useshortest-path routing protocol. The overlay nodesdeploy a RON-like overlay probing protocol forlink monitoring and failure detection. To performfailure recovery and re-routing, the overlay nodesuse an OSPF-like link state routing protocol.

We focus on overlay networks with fixed numberof nodes, randomly chosen but fixed node locations,flexible overlay topology and timer mechanisms.This is due to the practical consideration that over-lay administrators, e.g., PlanetLab, often rely onvoluntarily donated servers and cannot determinethe number of nodes and their placements. Theycan however, at their discretion, form the desiredoverlay topology and set their timer values. It is stillpossible for administrator to choose the number ofoverlay nodes and their placements if they have alarge pool of candidates. However, this issue is notthe focus of this paper.

4.1.2. Failure model

Failures are uniformly distributed across differ-ent IP links. The starting time of each IP-link failureis uniformly distributed over the simulation time.Since multiple IP paths (overlay links) might sharethe same failed IP link, one IP failure might trigger

Fig. 8. Path switches and failure distribution.

Z. Li et al. / Computer Networks 51 (2007) 3828–3843 3837

Page 12: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py

multiple overlay link failures. Note that even foroverlay link failures triggered by the same IP failureevents, the end-to-end observed failure duration canbe very different. Unless otherwise specified, the IP-path failure duration (or TTR) used in our simula-tion has a distribution as displayed in Eq. (20).

P ðtÞ ¼0:067� e�0:0289�t if t 6 31:95;

1� 19t�0:85 if t > 31:95:

�ð20Þ

The following description explains how we deter-mined this model. Based on large scale connectivitytraces, Dahlin et al. [16] observed that 30% of IPpath unavailabilities have duration longer than30 s, and their failure duration is well modeled asP 0(t) = 1 � 19t�0.85. The duration distribution offailures lasting less than 30 s is not presented in[16]. Assuming this portion of failure durations areexponentially distributed with mean kx, and basedon the constraint of their ratio (Eq. (21)) and assum-ing a continuous boundary condition (Eq. (22))Z 30

0

akx

e�t

kx dt ¼ 0:70; ð21Þ

1� 19t�0:85jt¼31:95þ ¼akx

e�t

kx jt¼31:95�; ð22Þ

we get a = 2.32 and kx = 34.6.This distribution is heavy-tailed and has a mean

of infinity due to the t > 31.95 portion [16]. To avoidbeing biased by a few outliers with large failureduration, we cut off the distribution at t = 100. Inaddition to this realistic WAN failure model, wealso simulate synthetic failure durations with expo-nential distribution of mean k.

4.2. Failure detection

An overlay network must be able to detect an IP-layer link failure before it can provide any recoveryservice. Therefore, under a given constraint of over-head, a good overlay design should detect as manyfailures as possible.

In Fig. 9(a) and (b), we study the effect of probingtimer value when using a full-mesh topology forfailure detection. Failure detection ratio decreaseslinearly with an increasing Tp. Assuming an exponen-tially distributed failure duration with mean of 60 s,the default timer setting of RON will allow us todetect about 80% of the failure events. For the realis-tic WAN failure model, we can detect about 60% ofthe failure events. Fig. 9(b) looks at failure detectionfrom the perspective of Time-to-Detect (TTD). Over-lay probing mechanism is able to detect failures with

a TTD ranging from kTt to Tp + kTt. Using a smallerTp can reduce the range of the TTD distribution.

In Fig. 10, we study the failure detection ratiounder different combinations of node degree (D)and Tp value. It can be observed in Fig. 10 that a smal-ler Tp or a larger connection degree (D) can detectmore failures. However, as discussed in Section 3.3,the overhead is proportional to D/Tp. Therefore,given a constraint on overhead, one can choose thevalue of D or Tp to optimize the detection ratio. Inthe following, we will use L = D/Tp to measure thelevel of probing overhead. The default setting, full-mesh, Tp = 12 s in a 50-node overlay has L = 4.083.

One can notice from Fig. 10 that the detectionratio has a steeper curve vs. connection degree D

in contrast to Tp. Therefore, a larger Tp and D

combination is the optimized value. This can beconfirmed in Fig. 11. At different levels of overhead,a larger Tp (and D) results in higher failure detectionratio. Therefore, for a maximum failure detectionratio, an overlay administrator should build a

Fig. 9. Impact of setting Tp. (a) Tp vs. detection ratio and (b) Tp

vs. TTD.

3838 Z. Li et al. / Computer Networks 51 (2007) 3828–3843

Page 13: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

pyfull-mesh (largest D) and set the Tp large enough(moving towards the right) so that it satisfies theoverhead constraint. When acceptable overhead islarge, one can afford to have a smaller Tp and largerD (moving towards the top left corner) for largerfailure detection ratio.

4.3. Failure recovery

Once an overlay monitoring session detects afailure, it will announce the failure events to itsneighbors through Overlay-layer Link-State Adver-tisements (OLSAs). In the mean time, it will performfailure recovery by looking for an alternative route.As discussed in the previous section, it is best for theoverlay network to monitor overlay links based ontopologies with higher node degree for failure detec-tion purpose. However, the end goal of overlay net-work is not only to detect failures but also to providefailure recovery services.

As presented in Fig. 6, increasing node degreeproviding negligible improvement on the ideal fail-

ure recovery ratio when D P 4, especially if failuresinvolve only one IP link. In Section 3.6, we discussedthat additional overlay links may actually delay net-work convergence, since overlay networks tend toobserve multiple correlated link failures. Conse-quently, a topology with higher node degree maynot be the better choice for failure recovery pur-poses, which, our simulation results below confirms.

In the following part, we call the full-mesh topol-ogy used for failure detection as the probing graph.We use a subgraph of the probing graph, calledrecovery graph for failure recovery.1 The recoverygraph contains all the overlay nodes with a smalleraverage node degree (DR). Based on these twographs, OLSA will be propagated only if a detectedoverlay link failure also affects recovery graph.However, for each detected failure, overlay nodesalways try to re-route the failure (finding alternateoverlay paths) through the recovery graph.

Fig. 12(a) depicts the effect of Tp on TTR basedon a recovery graph of node degree 4. Most failuresare recovered within hkTt, Tp + kTti. This is similarto Tp’s effect on TTD. However, a careful compar-ison of Fig. 12(a) and 9(b) reveals the differencesof slope. The distributions of TTD tend to be highernear the left size while the distributions of TTR tendto be higher near the right side. This is due to thefact that overlay network cannot always recoverfrom a failure immediately upon detection. Instead,successful recovery is often delayed until each of thecorrelated overlay failures is detected and each cor-responding OLSA is propagated.

Fig. 13(a) shows that increasing Tp will increaseMTTR, or decrease AFDR. This is because a largerTp not only reduces the failure detection ratio butalso delays the start of the failure recovery process.On the positive side, a larger Tp value makes theoverlay network less responsive to short-lived fail-ures. By not responding to short-lived failures, theharmful effect on AFDR is minimum and networkis more stable (as shown in Fig. 14(a)).

Figs. 12(b), 13(b) and 14(b) present the impact ofintroducing a hold-off timer. Hold-off timer essen-tially delays the recovery process for failuresdetected earlier, accumulates failures (OLSAs) overTrt time, and makes one combined route recalcula-tion. A larger Trt will allow the overlay node toaccumulate more correlated OLSAs and reducesre-calculation. This can effectively improve network

4 6 8 10 12 14 16 18 20

0 5 10 15 20

25 30 35

40 45

50 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7

Detection Ratio

Tp

D

Detection Ratio

Fig. 10. Tp and D on detection ratio.

Fig. 11. Probing overhead.

1 The detail motivation of separating these two graphs will beexpanded in the following paragraphs.

Z. Li et al. / Computer Networks 51 (2007) 3828–3843 3839

Page 14: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

pystability, as can be observed in Fig. 14(a). On thenegative side, introducing a hold-off timer clearlymoves the distribution of TTR to the right andincreases the TTR (Fig. 12(b)).

In Fig. 13(a), we note that using a recovery graphwith smaller node degree actually results in smallerMTTR (or larger AFDR). This is because anincrease in node degree helps little in ideal recoveryratio but increases the chance of delayed re-conver-gence. In addition, Fig. 14(a) shows that a smallernode degree incurs fewer path changes, thusimproving network stability. Similar results can beobserved from Figs. 13(b) and 14(b). Therefore,increasing the node degree is counterproductivefor failure recovery.

4.4. Suggestions and discussions

Based on the analysis and simulation results, webelieve the following techniques can be adopted todesign overlay networks with greater performance.

• The node degree (D) have opposite impacts onfailure detection and failure recovery. For failuredetection, a larger node degree is preferred. Incontrast, a smaller node degree is preferred forfailure recovery. Using a full-mesh graph for fail-ure detection and a graph with smaller nodedegree for failure recovery can exploit the bestof both approaches. Practically, we believe thatlittle modification is needed to achieve this goal.In addition, end nodes/applications may takethe responsibility of detecting path failureswhile the overlay networks are only in chargeof providing resilient end-to-end forwardingpaths.

• The timer values present a fundamental tradeoffbetween network stability, overhead, and failurerecovery performance. The probing timer (Tp)can improve failure detection and recovery atthe price of probing overhead and stability.Using a hold-off timer (Trt), which is not cur-rently implemented in popular overlay networks,

Fig. 12. Timer on TTR distribution. (a) Trt vs. TTR and (b) Trt

vs. TTR.

Fig. 13. Timer on recovery performance. (a) Tp vs. AFDR and(b) Trt vs. AFDR.

3840 Z. Li et al. / Computer Networks 51 (2007) 3828–3843

Page 15: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

pycan improve the network stability while mini-mally affect failure recovery performance.

• Several people have proposed to build overlaynetwork with topology-awareness [17,18]. Thiscould fundamentally change the scenario of over-lay failure detection and recovery. With theknowledge of lower-layer topology, one couldmake meaningful inference about the status ofone link from the status of other links. If topol-ogy-aware overlay becomes a reality, the failuredetection algorithms based on information shar-ing [11] could speed up overlay failure detectionand recovery significantly. Topology-awarenessand information sharing can also relieve theoverlay from doing an O(n · k) (n is number ofoverlay nodes and k is average number of peers)probing and significantly reduce the overhead.We plan to investigate this approach thoroughlyin the future work.

5. Related work

Zhuang et al. [11] investigate the tradeoffs of dif-ferent overlay/P2P node failure detection algorithmsin terms of overhead, packet loss ratio and failuredetection ratio. In addition, the paper also recom-mends several mechanisms to design optimal nodefailure detection methods. The same topic is alsodiscussed in [19], in which the authors focus on ana-lytical models and propose a self-tuning method.Instead of node failure detection, we focus on theissue of overlay link failure detection and recoveryin the paper. Moreover, we focus on the impact ofIP-layer failure characteristics on probing intervalsetup and explore the tradeoff between failure recov-ery performance and overhead.

Some work has been done on setting up optimalhello message intervals in OSPF network environ-ment. Goyal et al. [20] investigate the impact oftopologies and network congestion on optimalHelloInterval for OSPF network through sim-ulation. Basu et al. [21] perform experimental studyof the stability of OSPF in terms of convergencetime, routing load and number of routing flaps. In[22], the authors use analytical methods to studythe effects of traffic overload on OSPF andBGP by quantifying the stability and robustnessproperties.

Qiu et al. [23] studied the vertical interaction

between selfish overlay network and lower-layertraffic engineering mechanisms. Based on the net-work traffic pattern, traffic engineering mechanism,e.g., OSPF and MPLS optimization, aims toachieve optimal network performance by adjustingrouting at its respective layer. On the other hand,overlay nodes aim to find the best route for them-selves and can affect the traffic demands observedby lower-layer. Qiu et al. show that the interplayof selfish overlay and traffic engineering can resultin a system performance worse than using eitherone of them. Keralapura et al. [24] discuss the pos-sible interaction between the IP layer and overlaynetworks that may affect traffic matrix estimationand load balancing, or lead to oscillatory race con-ditions between different overlay networks. Liuet al. [25] model the interaction between an overlaynetwork and traffic engineering work as a two-players game. Our work is complementary to theprior studies and provides new insights on usingoverlay to provide failure recovery services to IPnetworks.

Fig. 14. Timer on network stability. (a) Tp vs. Ns and (b) Trt vs.Ns.

Z. Li et al. / Computer Networks 51 (2007) 3828–3843 3841

Page 16: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py

6. Conclusion

In this paper, we perform analysis and simulationstudies to investigate failure detection and recoveryprocess in overlay networks. In particular, we studyhow the different parameters (Tp, Trt and D) impactthe performance of the overlay. These parametersare important factors in the tradeoffs between per-formance, probing overhead, and penalty (e.g.,oscillation of user perceived performance). A well-designed overlay network should set the parameterscarefully to meet its goals and constraints. Sincenode degree has opposite effects on failure detectionand recovery, we made a novel proposal that oneshould use a topology with high node degree forfailure detection and another topology with smallnode degree for failure recovery. Our analysis andresults provide important guidelines to design over-lay networks and to understand their performance.

References

[1] A. Markopoulou, G. Iannaconne, S. Bhattacharrya, C.N.Chuah, C. Diot, Characterization of failures in an IPbackbone, in: Proc. IEEE INFOCOM, 2004, pp. 2307–2317.

[2] R. Mahajan, D. Wetherall, T. Anderson, UnderstandingBGP Misconfiguration, in: Proc. ACM SIGCOMM, 2002,pp. 3–17.

[3] V.E. Paxson, Measurements and analysis of end-to-endInternet dynamics, Ph.D. thesis, UC Berkeley (April 1997).

[4] K.P. Gummadi, H.V. Madhyastha, S.D. Gribble, H.M.Levy, D. Wetherall, Improving the reliability of internetpaths with one-hop source routing, in: Proc. USENIX/ACMSymposium on Operating Systems Design and Implementa-tion, 2004, pp. 183–198.

[5] D.G. Andersen, H. Balakrishnan, M.F. Kaashoek, R.Morris, Resilient overlay network, in: Proc. ACM Sympo-sium on Operating Systems Principles, 2001, pp. 131–145.

[6] Akamai Corporation, http://www.akamai.com.[7] S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker,

A scalable content-addressable network, in: Proc. ACMSIGCOMM, 2001, pp. 161–172.

[8] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, H.Balakrishnan, Chord: A scalable peer-to-peer lookup servicefor internet applications, in: Proc. ACM SIGCOMM, 2001,149–160.

[9] Gnutella, http://www.gnutella.com.[10] Planet-lab Network Testbed, http://www.planet-lab.org.[11] S.Q. Zhuang, D. Geels, I. Stoica, R.H. Katz, On failure

detection algorithms in overlay networks, in: Proc. IEEEINFOCOM, 2005, pp. 2112–2123.

[12] S.Banerjee, S. Jee, B.Bhattacharjee, C.Kommareddy, Resil-ient multicast using overlays, in: Proc. ACM SIGMETRICS,2003, pp. 102–113.

[13] Y. Amir, C. Danilov, S. Goose, D. Hedqvist, A. Terzis, Anoverlay architecture for high quality VoIP streams, IEEETransactions on Multimedia, 2006.

[14] A. Medina, A. Lakhina, I. Matta, J. Byers, BRITE, http://www.cs.bu.edu/brite/, 2002.

[15] K.P. Gummadi, S. Saroiu, S.D. Gribble, King: Estimatinglatency between arbitrary internet end hosts, in: Proc.Internet Measurement Workshop, 2002, pp. 5–18.

[16] M. Dahlin, B.B.V. Chandra, L. Gao, A. Nayate, End-to-endWAN service availability, IEEE/ACM Trans. Networking.

[17] S. Ratnasamy, M. Handley, R. Karp, S. Shenker, Topolog-ically-aware overlay construction and server selection, in:Proc. IEEE INFOCOM, 2002, pp. 1190–1199.

[18] J. Han, D. Watson, F. Jahanian, Topology aware overlaynetworks, in: Proc. IEEE INFOCOM, 2005, pp. 2554–2565.

[19] R. Mahajan, M. Castro, A. Rowstron, Controlling the costof reliability in peer-to-peer overlays, in: Proc. InternationalWorkshop on Peer-to-Peer Systems, 2003, pp. 21–32.

[20] M. Goyal, K.K. Ramakrishnan, W. Feng, Achieving fasterfailure detection in OSPF networks, in: Proc. InternationalConference on Communications, 2003, pp. 296–300.

[21] A. Basu, J.G. Riecke, Stability issues in OSPF routing, in:Proc. ACM SIGCOMM, 2002, pp. 225–236.

[22] A. Shaikh, L. Kalampoukas, R. Dube, A. Varma, Routingstability in congested networks: Experimentation and anal-ysis, in: Proc. ACM SIGCOMM, 2000, pp. 163–174.

[23] L. Qiu, Y.R. Yang, Y. Zhang, S. Shenker, On selfish routingin internet-like environments, in: Proc. ACM SIGCOMM,2003, pp. 151–162.

[24] R. Keralapura, N. Taft, C.-N. Chuah, G. Iannaccone, CanISPs take the heat from overlay networks?, in: Proc. ACMHotNets, 2004, pp. 27–29.

[25] Y. Liu, H. Zhang, W. Gong, D. Towsley, On the interactionbetween overlay routing and traffic engineering, in: Proc.IEEE INFOCOM, 2005, pp. 2543–2553.

Zhi Li is currently a Senior Member ofTechnical Staff at AT&T labs, AT&TServices. He received his Ph.D. (2005)and Masters (2003) in Computer Science,University of California at Davis, Mas-ters (2000) in Computer Engineeringat Tsinghua University, China. Hisresearch interests include video over IP,Quality-of-Service, Multicasting, overlaynetworks, traffic engineering andModeling.

Lihua Yuan is currently a Ph.D Candi-date in the Department of Electrical andComputer Engineering at the Universityof California, Davis. He received hisBachelor’s degree in Electrical andElectronics Engineering from NanyangTechnological University (Singapore)and Master’s degree in Electrical andComputer Engineering from NationalUniversity of Singaore (Singapore). Hisresearch interests are in the area of net-

work management, management and security.

3842 Z. Li et al. / Computer Networks 51 (2007) 3828–3843

Page 17: Author's personal copy - University of California, Davischuah/rubinet/paper/2007/...Author's personal copy as the physical layer (optical network) and the MPLS/IP layer, has its own

Autho

r's

pers

onal

co

py

Prasant Mohapatra is currently a Pro-fessor in the Department of ComputerScience at the University of California,Davis. In the past, he was on the facultyat Iowa State University and MichiganState University. He has also held Visit-ing Scientist positions at Intel Corpora-tion, Panasonic Technologies, Instituteof Infocomm Research (I2R), Singapore,and National ICT Australia (NICTA).He received his Ph.D. in Computer

Engineering from the Pennsylvania State University in 1993. Hewas/is on the editorial board of the IEEE Transactions on com-puters, IEEE Transaction on Parallel and Distributed Systems,ACM WINET, and Ad Hoc Networks. He has been on the pro-gram/organizational committees of several international confer-ences. He was the Program Vice-Chair of INFOCOM 2004, andthe Program Co-Chair of the First IEEE International Conferenceon Sensor and Ad Hoc Communications and Networks (SECON2004). He has been a Guest Editor for IEEE Network, IEEETransactions on Mobile Computing, and the IEEE Computer.

His research interests are in the areas of wireless networks,sensor networks, Internet protocols and QoS. His research hasbeen funded though grants from the National Science Founda-tion, Intel Corporation, Siemens, Panasonic Technologies, Hew-lett Packard, and EMC Corporation.

Chen-Nee Chuah (SM’06) is currently anAssociate Professor in the Electrical andComputer Engineering Department atthe University of California, Davis(UCD). She received her B.S. in Electri-cal Engineering from Rutgers Universityin 1995, and her M.S. and Ph.D. inElectrical Engineering and ComputerSciences from the University of Califor-nia, Berkeley in 1997 and 2001, respec-tively. Before joining UCD, she held a

visiting researcher position at Sprint Advanced TechnologyLaboratories. Her research interests are in the area of computernetworking and distributed systems, Internet measurements,overlay/peer-to-peer systems, network security, wireless/mobilenetworking, and opportunistic communications. She received theNational Science Foundation CAREER Award in 2003 and theUC Davis College of Engineering Outstanding Junior FacultyAward in 2004. She has served as the Technical Program Co-Chair for the ACM Mobicom 2004 Workshop on Vehicular AdHoc Networks (VANET), and the Vice-Chair for IEEE Globe-com 2006. She has also served on the technical program com-mittee of several ACM- and IEEE-sponsored conferences.

Z. Li et al. / Computer Networks 51 (2007) 3828–3843 3843