optimal resilient distributed algorithms for ring election

6
475 IKANSACIIUNS UN YAKALLEL HNU UISIKIBUIELJ SlblEMd. VUL. 4, NU. Lt, fl\TRIL 1YY3 267-286, 1979. C.-k. Kim and D. A. Reed, “Adaptive packet routing in a hypercube,” in Proc. Third Conf: Hypercube Concurrent Comput. Appl., ACM Press, Jan. 1988, pp. 625-630. S. Konstantinidou, “Adaptive minimal routing in hypercubes,” in Proc. 6th MIT Conf: Advanced Res. VLSI, 1990, pp. 139-153. S. Konstantinidou and L. Snyder, “Chaos router: A practical appli- cation of randomization in network routing,” in Proc. Symp. Parallel Architectures Algorithms, 1990, pp. 21 -30. -, “Chaos router: Architecture and performance,” in Proc. 18th Annu. Symp. Comput. Architecture, 1991, pp. 212-221. J. N. Mailhot, “Routing and flow control strategies in multiprocessor networks,” S.B. thesis, May 1988. P. M. Merlin and P. J. Schweitzer, “Deadlock avoidance-store-and- forward deadlock,” IEEE Trans. Commun., vol. COM-28, no. 3, pp. 345-354, Mar. 1980. W. G. P. Mooij and A. Ligtenberg, “Architecture of a communication network processor,” in Proc. Parallel Architectures and Languages Europe 1989, Springer-Verlag, 1989, pp. 238-250. D. Nassimi and S. Sahni, “An optimal routing algorithm for mesh- connected parallel computers,” J. ACM, vol. 27, no. 1, pp. 6-29, Jan. 1980. -, “Optimal BPC permutations on a cube connected SIMD com- puter,” IEEE Trans. Comput., vol. C-31, no. 4, pp. 338-341, Apr. 1982. J. Y. Ngai, “A framework for adaptive routing in multicomputer networks,” Ph.D. dissertation, Caltech, 1989. J. Y. Ngai and C. L. Seitz, “A framework for adaptive routing in multicomputer networks,” in Proc. ACM Symp. Parallel Algorithms and Architectures, 1989. D. A. Reed and R. M. Fujimoto, Multicomputer Networks: Message- Based Parallel Processing. C. L. Seitz, “The Cosmic cube,” Commun. ACM, vol. 28, no. 1, pp. 22-33, Jan. 1985. A. S. Tanenbaum, Computer Networks, second ed. Englewood Cliffs, NJ: Prentice-Hall, 1988. L. G. Valiant, “A scheme for fast parallel communication,” SIAM J. Comput., vol. 11, no. 2, pp. 350-361, May 1982. -, “Optimality of a two-phase strategy for routing in interconnection networks,” IEEE Trans. Comput., vol. C-32, no. 9, pp. 861 -863, Sept. 1983. Cambridge, MA: MIT Press, 1987. Optimal Resilient Distributed Algorithms for Ring Election M. Y. Chan and F. Y. L. Chin Abstract- This paper considers the problem of electing a leader in a dynamic ring in which processors are permitted to fail and recover during election. e( n log n + I;, ) messages, when counting only messages sent by functional processors, are shown to be necessary and sufficient for dynamic ring election, where I;v is the number of processor recoveries experienced. Index Terms-Distributed election, processor failures and recoveries, unidirectional rings. Manuscript received June 26, 1989; revised June 20, 1991 and August 2, The authors are with the Department of Computer Science, The University IEEE Log Number 9206276. 1992. of Hong Kong, Hong Kong. I. INTRODUCTION One of the most studied problems in the area of distributed algorithms is distributed leader election. Many papers have been written about distributed leader election especially on rings. To cite just a few references, consider [12], [3], [2], [9], [lo], [4], [6], [15], [16], [7], [ll], [17], [18], [l], [14], [8]. All of these papers deal with the static ring, with the exception of Goldreich and Shrira’s treatment of election in rings with communication linkfailures [8]. The problem of considering rings with processor failures and recoveries during election provides a complement to [8], and was first suggested by Filman and Friedman [5] as a problem worthy of research. One of the main assumptions of this problem is that, when a processor leaves the ring (fails), the ring is patched around its place. This property allows for some rather interesting solutions. This paper considers the problem of electing a leader in a dynamic ring in which processors are permitted to fail and recover during election. e( rt log 71 + k,) messages, when counting only messages sent by functional processors, are shown to be necessary and sufficient for dynamic ring election, where k, is the number of processor recoveries experienced. 11. THE MODEL FOR DYNAMIC RINGS The objective is to devise an algorithm, to be run on each processor, which will distinguish one of the functional processors as leader. We outline in greater detail the assumptions of our model: We consider a system of n independent processors arranged and connected in a circular fashion by n point-to-point uni- directional communication links. Initially, they are all in the “sleep” state (Fig. 1). Each processor is distinguished by a unique identification number. Furthermore, we assume that only comparisons of identity numbers can be made, and the algorithm is not aware of the domain or range to which identities belong. As assumed in [3], [4], (71, and [9], processors may start, or “wakeup” to, the algorithm, i.e., get into the “active/relay” state (Fig. l), either spontaneously at any arbitrary time of its own free will, or upon receipt of a message of the algo- rithm. Election begins when at least one processor awakens spontaneously. The network is also assumed to provide both “sequential” and “guaranteed” communications, meaning messages sent across a link will be eventually received, and received in the order sent and received as sent. In other words, communication is reliable and only processors are faulty. When processors fail, they get into the “failed” state (Fig. 1). We consider only “clean” failures, i.e., “failed” processors simply stop participating in the election protocol and do not behave maliciously. In fact, the “clean” failures also imply that the ring structure would not be disrupted by processor faults as messages simply pass through or around “failed” processors. This assumption is now common for ring networks and is made possible by providing a bypass switch for each processor [13]. For added flexibility we assume that processors may fail (“active/relay” state + “failed” state), or recover after failing (“failed” state -t “sleep” state), at any time, as long as eventually there is at least one functional processor in the ring. And there are no limits to the number of times a processor may fail and then recover during election. Thus, the number of 1045-9219/93$03.00 0 1993 IEEE

Upload: fyl

Post on 09-Mar-2017

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimal resilient distributed algorithms for ring election

475 IKANSACIIUNS UN YAKALLEL HNU U I S I K I B U I E L J SlblEMd. V U L . 4, NU. Lt, fl\TRIL 1 Y Y 3

267-286, 1979. C.-k. Kim and D. A. Reed, “Adaptive packet routing in a hypercube,” in Proc. Third Conf: Hypercube Concurrent Comput. Appl., ACM Press, Jan. 1988, pp. 625-630. S. Konstantinidou, “Adaptive minimal routing in hypercubes,” in Proc. 6th MIT Conf: Advanced Res. VLSI, 1990, pp. 139-153. S. Konstantinidou and L. Snyder, “Chaos router: A practical appli- cation of randomization in network routing,” in Proc. Symp. Parallel Architectures Algorithms, 1990, pp. 21 -30. -, “Chaos router: Architecture and performance,” in Proc. 18th Annu. Symp. Comput. Architecture, 1991, pp. 212-221. J. N. Mailhot, “Routing and flow control strategies in multiprocessor networks,” S.B. thesis, May 1988. P. M. Merlin and P. J. Schweitzer, “Deadlock avoidance-store-and- forward deadlock,” IEEE Trans. Commun., vol. COM-28, no. 3, pp. 345-354, Mar. 1980. W. G. P. Mooij and A. Ligtenberg, “Architecture of a communication network processor,” in Proc. Parallel Architectures and Languages Europe 1989, Springer-Verlag, 1989, pp. 238-250. D. Nassimi and S. Sahni, “An optimal routing algorithm for mesh- connected parallel computers,” J . ACM, vol. 27, no. 1, pp. 6-29, Jan. 1980. -, “Optimal BPC permutations on a cube connected SIMD com- puter,” IEEE Trans. Comput., vol. C-31, no. 4, pp. 338-341, Apr. 1982. J. Y. Ngai, “A framework for adaptive routing in multicomputer networks,” Ph.D. dissertation, Caltech, 1989. J. Y. Ngai and C. L. Seitz, “A framework for adaptive routing in multicomputer networks,” in Proc. ACM Symp. Parallel Algorithms and Architectures, 1989. D. A. Reed and R. M. Fujimoto, Multicomputer Networks: Message- Based Parallel Processing. C. L. Seitz, “The Cosmic cube,” Commun. ACM, vol. 28, no. 1, pp. 22-33, Jan. 1985. A. S. Tanenbaum, Computer Networks, second ed. Englewood Cliffs, NJ: Prentice-Hall, 1988. L. G. Valiant, “A scheme for fast parallel communication,” SIAM J . Comput., vol. 11, no. 2, pp. 350-361, May 1982. -, “Optimality of a two-phase strategy for routing in interconnection networks,” IEEE Trans. Comput., vol. C-32, no. 9, pp. 861 -863, Sept. 1983.

Cambridge, MA: MIT Press, 1987.

Optimal Resilient Distributed Algorithms for Ring Election

M. Y. Chan and F. Y. L. Chin

Abstract- This paper considers the problem of electing a leader in a dynamic ring in which processors are permitted to fail and recover during election. e( n log n + I;, ) messages, when counting only messages sent by functional processors, are shown to be necessary and sufficient for dynamic ring election, where I;v is the number of processor recoveries experienced.

Index Terms-Distributed election, processor failures and recoveries, unidirectional rings.

Manuscript received June 26, 1989; revised June 20, 1991 and August 2,

The authors are with the Department of Computer Science, The University

IEEE Log Number 9206276.

1992.

of Hong Kong, Hong Kong.

I. INTRODUCTION

One of the most studied problems in the area of distributed algorithms is distributed leader election. Many papers have been written about distributed leader election especially on rings. To cite just a few references, consider [12], [3], [2], [9], [lo], [4], [6], [15], [16], [7], [ l l ] , [17], [18], [l], [14], [8]. All of these papers deal with the static ring, with the exception of Goldreich and Shrira’s treatment of election in rings with communication linkfailures [8]. The problem of considering rings with processor failures and recoveries during election provides a complement to [8], and was first suggested by Filman and Friedman [5] as a problem worthy of research. One of the main assumptions of this problem is that, when a processor leaves the ring (fails), the ring is patched around its place. This property allows for some rather interesting solutions.

This paper considers the problem of electing a leader in a dynamic ring in which processors are permitted to fail and recover during election. e ( rt log 71 + k , ) messages, when counting only messages sent by functional processors, are shown to be necessary and sufficient for dynamic ring election, where k , is the number of processor recoveries experienced.

11. THE MODEL FOR DYNAMIC RINGS The objective is to devise an algorithm, to be run on each processor,

which will distinguish one of the functional processors as leader. We outline in greater detail the assumptions of our model:

We consider a system of n independent processors arranged and connected in a circular fashion by n point-to-point uni- directional communication links. Initially, they are all in the “sleep” state (Fig. 1). Each processor is distinguished by a unique identification number. Furthermore, we assume that only comparisons of identity numbers can be made, and the algorithm is not aware of the domain or range to which identities belong. As assumed in [3], [4], (71, and [9], processors may start, or “wakeup” to, the algorithm, i.e., get into the “active/relay” state (Fig. l), either spontaneously at any arbitrary time of its own free will, or upon receipt of a message of the algo- rithm. Election begins when at least one processor awakens spontaneously. The network is also assumed to provide both “sequential” and “guaranteed” communications, meaning messages sent across a link will be eventually received, and received in the order sent and received as sent. In other words, communication is reliable and only processors are faulty. When processors fail, they get into the “failed” state (Fig. 1). We consider only “clean” failures, i.e., “failed” processors simply stop participating in the election protocol and do not behave maliciously. In fact, the “clean” failures also imply that the ring structure would not be disrupted by processor faults as messages simply pass through or around “failed” processors. This assumption is now common for ring networks and is made possible by providing a bypass switch for each processor [13]. For added flexibility we assume that processors may fail (“active/relay” state + “failed” state), or recover after failing (“failed” state -t “sleep” state), at any time, as long as eventually there is at least one functional processor in the ring. And there are no limits to the number of times a processor may fail and then recover during election. Thus, the number of

1045-9219/93$03.00 0 1993 IEEE

Page 2: Optimal resilient distributed algorithms for ring election

476

Time 0 1 2 3 4 5 6 7

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 4, APRIL 1993

P; pu p;. PI p 9 e, w5 W I w 4 w3 *3 W8,*5 R7 W9,*4

R8 *7 R9 R9 R8

’ R9 *8 R9

R9 E

failure w

start

Fig. 1 . State transition diagram of a processor.

processor failures k f and the number of recoveries k , . can be larger than 7 1 . Note, however, that 0 5 b., - I ; , 5 r t .

A recovered processor (“sleep” state) starts to reparticipate in election only upon receiving a message of the algorithm, starting the algorithm entirely afresh, forgetting everything done prior to failure. Thus, a recovered processor must adopt a wait-and-see attitude and is forced to blend into the ongoing algorithm rather than being allowed to start a new election again. In one atomic step, a processor can receive a message, per- form some local computations and then send a message. By definition, a processor cannot fail while executing an atomic step. This assumption is realistic in the sense that, in real life, messages received and consumed by processors which fail after receiving but before sending will be eventually re-sent in the absence of acknowledgment of receipt.

111. LOWER BOUND ON MESSAGE COMPLEXITY

Message complexity is one of the key performance metrics for distributed algorithms. Message complexity measures the total num- ber of messages seni by functional ( “activeirelay ”) processors as a function of n the size of the ring, k f the number of processor failures and k , the number of processor recoveries during election. Messages which bypass “failed” processors are straightforward and do not occur any overhead upon transmission or receipt. Thus, such messages are not counted in determining message complexity.

Lemma I : Any election algorithm on a dynamic ring of t / pro- cessors requires at least max { c n - 1 , k T } messages, where ctt is the minimum number of messages required to elect a leader in a static ring of n processors and k,. is the number of processor recoveries experienced during election.

Proof: The proof of this algorithm is divided into two parts. The first part involves cn-1 and is straightforward. Consider a

dynamic ring which has a processor P that experiences A.,. recoveries and is in failure state whenever a message comes along. Assume P never wakes up spontaneously. Since messages are never actually received by P (they just bypass P), P will effectively never “wake up” to the election. Since all other processors never fail, they receive and send messages as if they were in a static ring of - 1 processors. Thus, any election algorithm that works on this dynamic ring requires at least c7L-l messages.

The following scenario proves the lower bound of k, messages. Consider any correct election algorithm ..I and a dynamic ring with the following behavior. Suppose some processor P starts algorithm d by sending a message AII, and then fails and recovers. Suppose also that all other processors have failed before receiving A l . Eventually, message JI will be relayed back to processor P. The point to notice is that upon receiving If, P cannot become leader, because conceivably there could be another processor Q, which like P, sent out a message, failed and recovered in time to receive back its own message, with Q totally unaware of P’s message (i.e., P’s message passed Q while C) was failed) and P totally unaware of Q’s message; if P were to become leader, so would Q. Hence, upon receiving 31, processor P must send out some message -11’ if algorithm A were to be correct. Suppose subsequent to sending M’, P once again fails and recovers and receives back 31’. Upon receiving AI’, another message must be issued by P, and so on. Hence, with b, processor recoveries, at least

U Theorem 1: Any election algorithm on a dynamic ring of n

processors requires at least Q( t i log 11 + A., ) messages, where k,. is the number of processor recoveries experienced during election.

Proof: From [lS], it is proved that cTL = n(7i logn) . As max (01 - l ) l o g ( n - l ) , k r } = R(111ogri + k , . ) , the theorem is proved. 0

k r messages are sent in the worst case.

Iv. ALGORITHMS FOR ELECTION IN DYNAMIC RINGS Our approach to finding suitable election algorithms will be to

extend past solutions for static rings to produce dynamic solutions. The algorithms of Chang and Roberts [3] and Peterson [16] are the targets for this exercise.

A Modified Chang and Roberts’ Algorithm

Let us consider the algorithm of Chang and Roberts [3]. The maximum-identity processor is to win the election. Upon initial wakeup, each processor sends its identity to its neighbor. Upon receiving an identity greater than its own, a processor will continue sending the identity; lesser-valued identities are not forwarded, i.e., eliminated. Upon receiving back its own identity, a processor will realize that its identity is maximum (otherwise, its identity would have been stopped) and know that it has been selected as leader.

Example:

Notice that the above algorithm will not work correctly in the case where all identities except the maximum have been eliminated and the maximum-identity processor has failed. For example, if Pg failed at time = 5, then the message 9 would circulate around the ring indefinitely. However, it can be easily fixed to produce a solution in spite of processor failures and recoveries. The key point is to have

Page 3: Optimal resilient distributed algorithms for ring election

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 4, APRIL 1993 417

- _ - Fig. 2. Worst case example for Algorithm A.

each processor remember the maximum identity it has sent out and if it should receive this maximum identity again to realize that the processor having this identity has probably failed and thus to declare itself elected.

ALGORITHM A /* The “active/relay” (functional) processor with maximum

identity will be elected ID - the identity of this processor TID - temporary identity NTID - newly received identity

*I if initial wakeup then TID := ID of this processor else /* after recovery */ receive(T1D); send(T1D); . do forever

receive(NT1D); if NTID = TID then ELECTED; if NTID > TID then

TID := NTID; send(T1D);

enddo

With this new Algorithm A, P3 will adopt the new identity 9 at time 2 when message 9 is relayed. Thus when message 9 bypasses the “failed” processor Ps, P3 with the new identity will be elected.

In Algorithm A, each processor during each of its sustained periods of functionality issues at most n messages. With IC, processor recoveries, at most n(n + IC,) messages can be incurred and this message complexity is best possible as the following example shows.

Example (Fig. 2):

Processor with value a sends n and fails Processor with value n - 1 sends n - 1, n and fails Processor with value n - 2 sends n - 2, n - 1, n and fails

Processor with value 2 sends 2,3,4, . . . , n and fails Processor with value 1 sends 1 ,2 , . . . , n Repeat until IC, recoveries:

...

Processor with value 1 fails and recovers Processor with value 1 receives and sends 1 ,2 ,3 , . . . , n

This gives 0(n2 + k,n) messages.

A. Modified Peterson’s Algorithm Peterson’s algorithm is a bit more complicated. In the following,

a revised version of Peterson’s algorithm is described. First of all, processors are deemed to be either in state “relay” or “active.” Initially all are “active.” The task of the “relay” processor is to simply relay messages, i.e., any message that reaches a “relay” processor is immediately forwarded. The algorithm proceeds in phases. The goal is to reduce the number of “active” processors by at least half during each phase. Each “active” processor starts a phase by sending its

identity ID to its neighbor. Upon receiving the first message NIDl of the phase, the “active” processor simply passes NIDl to its neighbor. Upon receiving the second message NID2 of the phase, the “active” processor will make a decision as to whether it will remain “active” for the next phase. The criterion for remaining “active” is that NIDl = max(ID,NIDl,NID2). Given this criterion, for each consecutive pair of “active” processors, at most one of them will remain “active” for the next phase. The number of “active” processors will thus be dutifully cut down by at least half as the algorithm proceeds from one phase to another until finally there is only one “active” processor left. This “active” processor is the leader. (Note that the maximum-identity processor will not necessarily win the election.)

The failure of “relay” processors does not jeopardize the correct- ness of the above algorithm. However, failure of all “active” proces- sors causes a leader to not be elected as only an “active” processor can become leader. To remedy this, we seek to maintain at least one “active” processor. Notice that if an “active” processor were to fail, some processor in the ring will receive more than two messages of the same phase. Without failures, each processor should receive only two messages per phase. This observation leads us to a simple way of detecting the failure of an “active” processor. When a processor de- tects the failure of an “active” processor (i.e., receives more than two messages of the same phase), it becomes “active” for the next phase to compensate for the loss of the “active” processor and to ensure at least one “active” processor. The following algorithm embodies this idea.

ALGORITHM B /* ID - the identity of this processor

PHASE - the current phase number TID - temporary identity TPHASE - temporary phase number NIDi - ith identity received ACT-FLAG - true if processor is “active”

*I PHASE := 0; if after recovery then goto loop; /* upon recovery, become “relay” */ active:

/* become “active” for the next phase */ ACT-FLAG := true; PHASE := PHASE + 1; I := 1;

send(PHASE,ID); loop:

repeat receive(TPHASE,TID) until TPHASE 2 PHASE; if TPHASE > PHASE then

/* become “relay” in new phase */ ACT-FLAG := false; PHASE := TPHASE; I := 1;

/* Ith message of this phase has been received */ case I of 1:

2:

3: endcase; I := I + 1; goto loop;

NIDl := TID; if NIDl = ID and ACT-FLAG then

ELECTED; send(TPHASE,TID); NID2 := TID; if NIDl = max(ID,NIDl,NID2) and

if not ACT-FLAG then send(TPHASE,TID); goto active;

ACT-FLAG then goto active;

Page 4: Optimal resilient distributed algorithms for ring election

478 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 4, APRIL 1993

relay i Fig. 3. Worst case example for Algorithm B.

Upon the failure of an “active” processor, another may be in- troduced at the next phase. In particular, the number of “active” processors can remain the same from one phase to the next if all “active” processors failed and were detected and replaced by “relay” processors. Thus, given at most kf failures, there can be log n + kf phases. At each phase, each “active” or “relay” processor can send up to two messages before failing, starting the next phase, or becoming leader. Hence, the message complexity of the algorithm is O(n log n + k f n ) . The following example shows that the message complexity is tight.

Example (Fig. 3): Algorithm executes without failures until 3 “active” processors are

left: P, Q, R. Repeat until kf failures:

The 3 “active” processors all fail. and 3 “relay” processors take over and become “active.” The 3 “failed” processors recover as “relay.”

This gives O(n log n + k f n ) messages.

A. Message-Optimal Algorithm The main problem with Algorithm B is that the number of phases

can be as high as log n + kf which leads to an O(n log n + kfn) message complexity, because the number of “active” processors does not necessarily decrease from one phase to the next. If we could somehow retain the active-processor-reduction property of Peterson’s original algorithm and reduce the number of “active” processors by a fraction, we would have @(log n) phases, whereby O(n log n + kr ) message complexity could be achieved. To this end, we have the following Algorithm C.

It would be ideal if we could create an “active” processor upon failure only when an “active” processor which would have tumed “ac- tive” at the next phase fails. To some extent, this can be accomplished. A processor ((‘active” or “relay”), with identity ID, in receipt of three identities NID1, processors NID2 and NID3 of the same phase would not only know that “active” processor NIDl has failed, but also would know whether or not processor NIDl would have been “active” for the next phase (i.e., whether NID2 = max(NIDl,NID2,NID3)), and act accordingly. Only if processor NIDl would have remained “active” had it not failed, can processor ID become “active” in its place for the next phase. Thus, the number of “active” processors will certainly decrease from one phase to the next.

However, if processor NIDl should have been “relay,” what should processor ID do? Suppose processor ID were to send nothing after it received NID3. In the case where processor NID2 were to also fail, processor ID would be in receipt of a fourth identity NID4. Likewise, looking at NID2, NID3 and NID4, processor ID could takeover for the “failed” processor NID2. But, suppose processor ID too had failed before receiving NID4. Its following neighbor would only see NID1, NID2 and NID4 (because processor ID sent nothing after received NID3), and attempt possibly to takeover for processor NIDl on the

basis of these values, entirely unaware of the failure of processor NID2. It could be the case that processor NID2 would have been “active” had it not failed, but no processor at this stage can properly takeover for processor NID2. Hence, perhaps processor ID should have passed NID3 in the first place, but, in this case, there is the danger of incurring far too many messages.

We propose the following solution. Processor ID, instead of sending NID3, sends an “INACTIVE message to indicate to its neighbor that an active processor has become “relay.” “INACTIVE messages circulate around the ring with each processor limited to sending one per phase. Moreover, we require each “active” processor to send an “INACTIVE” message upon becoming “relay.” The idea is to only allow an “active” processor to be created in place of a failed “active” processor if either the failed “active” would have remained “active” had it not failed or the failed “active” precedes an “active” processor which has become “relay.” The former situation can be detected by any “relay” processor through the reception of three identities NID1, NID2 and NID3 with NID2 = max(NIDl,NID2,NID3). The latter situation can be detected by an “active”-turn-“relay” processor with the help of “INACTIVE messages. Receiving an identity after having sent an “INACTIVE message (in a particular phase) indicates the failure of an “active” processor which precedes an “active”-turn-“relay” processor.

In summary, a processor ID will be “active” in the next phase if it satisfies any one of the following conditions, where NID1, NID2, NID3 (in this order) are the three identities of the same phase received by processor ID:

A) the processor ID is “active” and NIDl = max(ID, NID1, NID2) B) the processor ID is “relay” and NID2 = max(NID1, NID2,

NID3) C) the processor ID receives an identity after having sent an

“INACTIVE message out. Under normal circumstances with no failures, a processor will send

two identity messages of the same phase, possibly followed by an “INACTIVE message of this same phase, followed by two identity messages of the next phase, possibly followed by an “INACTIVE” message of this next phase, and so on.

The following lemma shows that at least one processor will remain “active” in any particular phase.

Lemma 2: Given at least one “active” processor in a particular phase, the application of the above three conditions will ensure that there is at least one “active” processor for the next phase.

Proof: Without loss of generality, assume ID2 is the maximum identity among all the “active” processors in a particular phase. Since each processor (“active,” “relay,” or “failed”) will receive at least the identities of the two “active” processors preceding it, the “active” processor, say processor ID1, immediately following processor ID2, if it did not fail, will remain “active” because of condition A. Should processor ID1 fail, processor ID1 would have sent out three identities, ID1, ID2, and ID3 to the next processor immediately following processor ID1, where ID3 is the identify of the “active” processor immediately preceding processor ID2. Even if there are failures of some processors in-between, these three identities will normally pass around the ring (in order) undisruptedly unless there exists a processor which, upon receipt of one of these three identities, either changes to “relay,” or changes to or remains “active.” The lemma is proved if some processor changes to or remains “active” for the next phase. Otherwise, there are three cases to consider:

1) A processor changes to “relay” upon receiving identity ID1. In this case, an “INACTIVE message would be issued. Even- tually, some processor will receive identity ID2 after having passed along this “INACTIVE message and will become

Page 5: Optimal resilient distributed algorithms for ring election

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS. VOL. 4. NO. 4, APRIL 1993 479 t \

“active” because of condition C. 2) A processor changes to “relay” upon receiving identity ID2.

In this case, an “INACTIVE” message would be issued. Even- tually, some processor will receive identity ID3 after having passed along this “INACTIVE message and will become “active” because of condition C.

3) A processor receives all three identities ID1, ID2 and ID3 without sending an “INACTIVE” message. In this case, the processor must be a “relay” processor and would become “active” because of condition B.

0 The following lemma will show that the number of “active”

processors by at least one-third from one phase to the next. Lemma 3: The application of the above three conditions will ensure

th? the number of “active” processors is reduced by at least one third from one phase to the next.

Proof: It has been proven in [4] that condition A under normal circumstances with no failures, ensures that no two adjacent “active” processors can remain “active” for the next phase. Thus, the number of “active” processors is reduced by half in the event of no failure.

However, with failures, it is possible to have adjacent “active” processors remain “active” or be substituted by two other processors in the next phase. Conditions B and C are applicable when there is failure of “active” processors. In both cases, some processor ID which supposedly would not be “active” under normal circumstances would become “active.”’However, these situations only occur when the “active” processor immediately preceding processor ID fails. Due to such a failure, there cannot be more than two adjacent processors that remain “active” or are substituted by some other processors in

Fig. 4 gives an ex’ample to show that the bound for Lemma 3 is

. I

the next phase. 0

best possible.

ALGORITHM C /* ID - the identity of this processor

PHASE - the current phase number TID - temporary identity TPHASE - temporary phase number NIDi - ith identity sent I - the number of messages of this phase sent

,

*/ PHASE := 0; if after recovery then goto loop; /* upon recovery, become “relay” */ active:

/* become “active” for the next phase */ NIDl := ID; PHASE := PHASE + 1; I := 1; send(PHASE,NIDl);

repeat receive(TPHASE,TID) until TPHASE 2 PHASE; if TID = “INACTIVE’ then

loop:

/* I = 3 indicates that “INACTIVE has been sent for this phase */

if I # 3 then send(PHASE,”INACTIVE’); I := 3; goto loop;

if TPHASE > PHASE then /* become “relay” */ NIDl := TID; PHASE := TPHASE; I := 1;

active - - - active active reiay + + + + active

relay I failed active

active ’ active

Fig. 4. Example for Lemma 3.

send(PHASE,NIDl); goto loop;

case I of 1: NID2 := TID;

if NID2 = NIDl then ELECTED; send(PHASE,TID); /* conditions (A) and (B) in text */ NID3 := TID; if NID2 = max(NIDl,NIDZ,NID3) then goto

active; else send(PHASE,”INACTIVE”); /* condition (C) in text */ goto active;

2:

3:

endcase; I := I + 1; goto loop;

Thus, given at least two “active” processors in a particular phase, Algorithm C (its state transition diagram as given in Fig. 5) will ensure that at least one processor will be in “active” state for the next phase, and moreover, ensure that the number of “active” processors will be reduced by at least one-third for the next phase, so at most log, 5n phases will pass before one “active” processor remains. With only one “active” processor, only one identity message will circulate. The first processor to receive this identity after having sent it will be the leader.

At each phase, each “active” or “relay” processor can se@ up to three messages before failing, starting the next phase, or be- coming leader. With k, processor recoveries and log, 5n phases, 3nlog, 5n + 3k, messages may be incurred. The length of a message for Algorithm C is O(log1ogn) bits plus the number of bits needed to encode a processor identity.

Theorem 2: There exists an algorithm to elect a leader in a unidirec- tional dynamic ring of n processors using O(n log n + k,.) messages, where k, is the number of processor recoveries experienced during election.

V. CONCLUDING REMARKS

We have given an optimal resilient election algorithm in terms of the number of messages for the ring network in which processors can fail and recover during election without restarting the algorithm again. Besides message complexity, another important measure for distributed algorithms is time complexity. Analysis of time complex- ity depends very much on the assumptions made. Some of these assumptions may not be considered too realistic. For this reason, we will not go into rigorous detail on time complexity. For example, one set of assumptions might be that each nonfailed processor can receive, process and send a message to the next nonfailed processor, all in

Page 6: Optimal resilient distributed algorithms for ring election

480 .t’

z IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 4, APRIL 1993

- receue(phase I NACTIVE )

Ive(phase nidt) re mdls INACTIVE

receve(tphase nidZ) where nldZ=ntdt

receive(tphase.nid2) where tphase=phase 8 nidZ8INACTIVE \\

I where tphasewhase receve(tphase.tid) \\

\ \ recew(tphase lid) where tphase>ph.asee nidl =tld

ceive(tphase md3) t w e I d2=max(nidl tphase-phase “id2 md3) a

phase d p h a s e

recewe(tphase.tid) where tphase,phase /nid2max(nidt n d 2 nid3) or

recewe(tphase.nid3) where tphase=phase 8

nid2= ‘INACTIVE) I / receive(tphase,tid) where tphase=phase 8 IidANACTIVE“

Note ‘recewe(1phase “INACTIVE) where tphaselphase” IS not possible as an “INACTIVE message can only be sent following a non-”INACTIVE message of the same phase Moreover. no action will be taken an messages that do not 111 descnptmn for state lrans~stions

Fig. 5 . State transition diagram for Algorithm C

one time unit. The implication is that it takes no time for messages to bypass failed processors. If this is the case, it is possible that up to n messages can arrive at a processor at the same time. For instance, if all processors P,, i = 1. . . . . I ) - 1, fail immediately after sending out amessage, processor P,L will end up receiving all n - 1 messages at the end of the time unit. Thus, it is necessary that each processor have sufficient local memory to store up to n messages. With such assumptions, using an analysis similar to that of Peterson’s algorithm [16], the time complexity of Algorithm C would be O ( n + k r ) . On the other hand, assuming that one time unit is also necessary for a message to bypass a failed processor, Algorithm C would take O((k, . + 1 ) n ) time units.

Finally, as one of the main benefits in distributed systems is fault-tolerance or resilience to failures, the study of other resilient

distributed algorithms in different networks with failures would be interesting and should receive more attention.

REFERENCES

[ l ] H. L. Bodlaender and J . van Leeuwen, “New upperbounds for decen- tralized extrema-finding in a ring of processors,” Tech. Rep. RUU- CS-85-15, Comput. Sci. Dep., Rijksuniversiteit Utrecht, Netherlands, 1985.

[2] J. E. Burns, “A formal model for message passing systems,” Tech. Rep. 91, Comput. Sci. Dep., Indiana Univ., Bloomington, IN, 1980.

[3] E. Chang and R. Roberts, “An improved algorithm for decentralized extrema-finding in circular configurations of processes,” Commun. ACM, vol. 22, pp. 281-283, 1979.

[4] D. Dolev, M. Klawe, and M. Rodeh, “An O(n log n) unidirectional distributed algorithm for extrema finding in a circle,” J . Algorithms, vol. 3, pp. 245-260, 1982.

[5] R. E. Filman and D. P. Friedman, Coordinated Computing: Tools and Techniques for Distributed Software. New York: McGraw-Hill, 1984.

[6] W. R. Franklin, “On an improved algorithm for decentralized extrema finding in circular configurations of processors,” Commun. ACM, vol.

[7] G. N. Frederickson and N. A. Lynch, “The impact of synchronous communication on the problem of electing a leader in a ring,” in Proc. 16th Annu. ACM Symp. Theory Comput., Washington, DC, 1984, pp. 493 -503.

[8] 0. Goldreich and L. Shrira, “The effects of link failure on computations in asynchronous rings,” in Proc. ACM Symp. Principles Distributed Comput., Calgary Aka., Canada, Aug. 1986, pp. 174-185.

[9] D. S. Hirshberg and J . B. Sinclair, “Decentralized extrema-finding in circular configurations of processors,” Commun. ACM, vol. 23, pp. 627-628, 1980.

[ lo ] A. Itai and M. Rodeh, “Symmetry breaking in distributive networks,” in Proc. 22nd IEEE Symp. Foundations Comput. Sei., Oct. 1981, pp. 150- 158.

[ l l ] E. Korach, D. Rotem, and N. Santoro, “Distributed election in a circle without a global sense of orientation,” Int. J . Comput. Math., vol. 14, 1984.

[ 121 G. LeLann, “Distributed systems - Towards a formal approach,” In- formation Processing 77. New York: Elsevier Science, 1977, pp. 155- 160.

[ 131 J. Martin, Local Area Networks - Architectures and Implementations. Englewood Cliffs, NJ: Prentice-Hall, 1989.

[14] S. Moran, M. Shalom, and S. Zaks, “A 1.44 ... nlogn algorithm for distributed leader finding in bidirectional rings of processors,” Tech. Rep. 389, Comput. Sci. Dep., Technion, Nov. 1985.

[15] J. Pachl, E. Korach, and D. Rotem, “Lower bounds for distributed maximum-finding algorithms,” J . ACM, vol. 31, pp. 905-918, 1984.

1161 G. L. Peterson, “An O(nlogn) unidirectional algorithm for the circular extrema problem,” ACM Trans. Programming Languages Syst., ;al. 4, pp. 758-762, 1982.

[17] D. Rotem, E. Korach, and N. Santoro, “Analysis of a distributed algorithm for extrema finding in a ring,’’ Tech. Rep. SCS-TR-61, School of Comput. Sci., Carleton Univ., Aug. 1984.

[ 181 P. M. B. Vitanyi, “Distributed election in an Archimedean ring of pro- cessors,” in Proc. 16th Annu. ACM Symp. Theory Comput., Washington, DC, 1984, pp. 542-547.

25, pp. 336-337, 1982.