lecture 4: elections, reset anish arora cse 763 notes include material from dr. jeff brumfield

Lecture 4:Elections, Reset

Anish Arora

CSE 763

Notes include material from Dr. Jeff Brumfield

Reading Material

• Hector Garcia-Molina, "Elections in a Distributed Computing System", IEEE Transactions on Computers, Vol. C-31, No. 1, January 1982, pp. 48-59

• E. W. Dijkstra, "Self Stabilizing Systems in Spite of Distributed Control," Communications of the ACM, Vol. 17, 1974, pp. 643-644

• A. Arora, M. Gouda, "Distributed Reset", IEEE Transactions on Computers, Vol. 43, No.9, September 1994, pp. 1026-1038

• Chapter 10 and 12 in Paul Sivilotti’s book

Election in Distributed Systems

ProblemSelect a unique site from a set of candidate sites

Selection scheme must not require a coordinator or leader

Applications• Selection of a coordinator for mutual exclusion, deadlock

detection, two-phase commit, etc• Selection of sites for location of replicated objects• Selection of a site to assume the duties of a failed server

Election Versus Mutual Exclusion

Similarities• Both election algorithms and mutual exclusion algorithms

select one site from a set of candidate sites• Both types of algorithms must function correctly in the

presence of failures

Differences• In an election, fairness may not be important. In mutual

exclusion, every site should eventually be selected• Every site must know the identity of the site that wins an

election. Other sites do not need to know the site selected by a mutual exclusion algorithm

Types of Election Algorithms

Probabilistic• Each competing site is equally likely to win the election

Static Priority• Each site has a unique predefined priority• The site having the highest priority should win the

election

Dynamic Priority• Each site has a priority that varies over time • The site having the highest priority at the beginning of the

election should win the election

A Probabilistic Algorithm

Algorithm (to be carried out by each process)

• Generate a random integer, b, uniformly distributed in the interval [0, N-1], where N is the number of processes

• Send the selected value to every other process

• When the values bi for i = 0,…,N-1 have been received from all other processes, compute

k := (i : 0iN-1 : bi) mod N

Process k wins the election

A Probabilistic Algorithm

Assumptions• Processes participating in the election are known a-priori and

are numbered 0,1,…,N-1• Processes do not fail or send inconsistent information

Analysis• Number of messages required in an election are N2-N• If a process follows this algorithm, its probability of winning

is 1/N, regardless of values selected by other processes• All processes determine the same winner

Variant of the Probabilistic Algorithm

Unknown number of participants N

• Generate N-1 values in the intervals [0,N-1], [0,N-2], … , [0,1]

• Exchange values with other participants

• When number of participants is determined, use appropriate set of values as in previous algorithm

The Bully Algorithm

Assumptions

• Each process is assigned a unique priority number• The highest priority active process should always win the

election• Every process knows of the existence of every other

process and its priority number• Process may fail during the election• Failed process may subsequently recover

The Bully Algorithm

The Algorithm

Send election message to each higher priority processDelay for time TIf no responses received then

take over as leaderinform each lower priority process of change

Else (* response received *)delay for time T’If “I am leader” message received

record this factElse restart the algorithm

The Bully Algorithm

Run this algorithm if

1. we receive no response from the leader

2. we receive an election message from a lower priority process

3. we have just recovered from failure

Analysis

O(N2) messages maybe required

The Bully Algorithm

Self Stabilization

• A system is self-stabilizing if, regardless of its initial state, it is guaranteed to arrive at a legitimate state in a finite number of steps

• If a failure occurs in a self-stabilizing system, the system will correct itself without any form of outside intervention

Assumptions

• Each site has a unique site number

• Sites can communicate directly with neighboring sites

• Each site maintains knowledge of its functioning neighbors

Objectives

• The functioning site having the highest site number is the leader

• Every functioning site knows the identity of the leader

• Every functioning site knows a functioning path to the leader

Perturbations

A perturbation in the system can be caused by a failure, a recovery from a failure, or an enhancement or reconfiguration of the system

Possible perturbations to a system: A site can fail or be removed from the system A site can recover from failure or be added to the system A communications link can fail or be removed from the

system A communications link can recover from failure or be

added to the system A variable in a site's local memory can be changed

Arora and Gouda’s Algorithm

Each site maintains three variables:• leader - the identity of the site believed to be the leader• parent - the identity of the next node in a path to the leader• dist - the distance to the leader, measured in number of

links

Algorithm Structure

This version of the algorithm assumes that a site's local variables cannot be corrupted

begin(our leader < self) or(we can’t communicate with parent)

our leader := self our parent := self

▯(parent’s leader our leader)

our leader := parent’s leader

▯(a neighbor’s leader > our leader)

our leader := neighbor’s leader our parent := neighbor

end

Simplified Algorithm

begin(leader.i < i) or(parent.ineighbor.i [i])

leader.i, parent.i := i, i

▯parent.i = j and j neighbor.iand leader.i leader.j

leader.i := leader.j

▯j neighbor.i and leader.i < leader.j

leader.i, parent.i := leader.j, jend

Example

Example (cont)

Formation of cycles

• The corruption of a site's local variables can produce a cycle in the parent graph

• The algorithm must be extended to automatically break cycles• Let K be an upper bound on the number of sites in the system

Complete Algorithm

begin(leader.i < i) or(parent.i = i and (leader.i i or dist.i 0)) or(parent.ineighbor.i [i]) or(dist.i K)

leader.i, parent.i, dist.i := i, i, 0

▯parent.i = j and j neighbor.i and dist.j < Kand (leader.i leader.j or dist.i dist.j+1)

leader.i, dist.i := leader.j. dist.j+1

▯leader.i < leader.j and j neighbor.i and dist.j < K

leader.i, parent.i, dist.i := leader.j, j, dist.j+1end

Fairness

• Minimal: If some program action is enabled, then some enabled action is executed

• Weak: If some program action is continuously enabled,then that program action is eventually executed

• Process: If some process actions are continuously enabled,then some enabled action of the process iseventually executed

• Strong: If some program action is infinitely often enabled,then that program action is infinitely often executed

Hyperfairness, extreme fairness, …Reference: “Fairness”, by Nissim Francez, Springer Verlag 1986

Fairness

Theorem: The Arora-Gouda protocol is correct under minimal fairness

Corollary: The Arora-Gouda protocol is correct under weak fairness, process fairness, …

1. Fake Leader values disappear:

Fake leader values of minimum distance “disappear”:

1. These values are non-decreasing

2. These values eventually increase

3. K is an upper bound for these values

Fairness

2. Process with highest priority elects itself as leader, by executing its first action: Let the highest priority up process be k Unless leader.k=k dist.k=0 parent.k=k holds, by (1),(2),(3)

the leader value k will disappear, and leader.k<k will be continuously enabled until the first action of k is executed

3. By induction on d – the distance of a process from process k – argue that all processes at distance d will eventually “correctly join” the tree routed at k: Assuming that the tree till depth d-1 is correctly formed, the

second or the third action of a process at distance d is continuously enabled unless the process correctly joins the tree

lecture 4: elections, reset anish arora cse 763 notes include material from dr. jeff brumfield

Documents