lecture 4: elections, reset anish arora cse 763 notes include material from dr. jeff brumfield
TRANSCRIPT
Lecture 4:Elections, Reset
Anish Arora
CSE 763
Notes include material from Dr. Jeff Brumfield
Reading Material
• Hector Garcia-Molina, "Elections in a Distributed Computing System", IEEE Transactions on Computers, Vol. C-31, No. 1, January 1982, pp. 48-59
• E. W. Dijkstra, "Self Stabilizing Systems in Spite of Distributed Control," Communications of the ACM, Vol. 17, 1974, pp. 643-644
• A. Arora, M. Gouda, "Distributed Reset", IEEE Transactions on Computers, Vol. 43, No.9, September 1994, pp. 1026-1038
• Chapter 10 and 12 in Paul Sivilotti’s book
Election in Distributed Systems
ProblemSelect a unique site from a set of candidate sites
Selection scheme must not require a coordinator or leader
Applications• Selection of a coordinator for mutual exclusion, deadlock
detection, two-phase commit, etc• Selection of sites for location of replicated objects• Selection of a site to assume the duties of a failed server
Election Versus Mutual Exclusion
Similarities• Both election algorithms and mutual exclusion algorithms
select one site from a set of candidate sites• Both types of algorithms must function correctly in the
presence of failures
Differences• In an election, fairness may not be important. In mutual
exclusion, every site should eventually be selected• Every site must know the identity of the site that wins an
election. Other sites do not need to know the site selected by a mutual exclusion algorithm
Types of Election Algorithms
Probabilistic• Each competing site is equally likely to win the election
Static Priority• Each site has a unique predefined priority• The site having the highest priority should win the
election
Dynamic Priority• Each site has a priority that varies over time • The site having the highest priority at the beginning of the
election should win the election
A Probabilistic Algorithm
Algorithm (to be carried out by each process)
• Generate a random integer, b, uniformly distributed in the interval [0, N-1], where N is the number of processes
• Send the selected value to every other process
• When the values bi for i = 0,…,N-1 have been received from all other processes, compute
k := (i : 0iN-1 : bi) mod N
Process k wins the election
A Probabilistic Algorithm
Assumptions• Processes participating in the election are known a-priori and
are numbered 0,1,…,N-1• Processes do not fail or send inconsistent information
Analysis• Number of messages required in an election are N2-N• If a process follows this algorithm, its probability of winning
is 1/N, regardless of values selected by other processes• All processes determine the same winner
Variant of the Probabilistic Algorithm
Unknown number of participants N
• Generate N-1 values in the intervals [0,N-1], [0,N-2], … , [0,1]
• Exchange values with other participants
• When number of participants is determined, use appropriate set of values as in previous algorithm
The Bully Algorithm
Assumptions
• Each process is assigned a unique priority number• The highest priority active process should always win the
election• Every process knows of the existence of every other
process and its priority number• Process may fail during the election• Failed process may subsequently recover
The Bully Algorithm
The Algorithm
Send election message to each higher priority processDelay for time TIf no responses received then
take over as leaderinform each lower priority process of change
Else (* response received *)delay for time T’If “I am leader” message received
record this factElse restart the algorithm
The Bully Algorithm
Run this algorithm if
1. we receive no response from the leader
2. we receive an election message from a lower priority process
3. we have just recovered from failure
Analysis
O(N2) messages maybe required
The Bully Algorithm
The Bully Algorithm
Self Stabilization
• A system is self-stabilizing if, regardless of its initial state, it is guaranteed to arrive at a legitimate state in a finite number of steps
• If a failure occurs in a self-stabilizing system, the system will correct itself without any form of outside intervention
Assumptions
• Each site has a unique site number
• Sites can communicate directly with neighboring sites
• Each site maintains knowledge of its functioning neighbors
Objectives
• The functioning site having the highest site number is the leader
• Every functioning site knows the identity of the leader
• Every functioning site knows a functioning path to the leader
Perturbations
A perturbation in the system can be caused by a failure, a recovery from a failure, or an enhancement or reconfiguration of the system
Possible perturbations to a system: A site can fail or be removed from the system A site can recover from failure or be added to the system A communications link can fail or be removed from the
system A communications link can recover from failure or be
added to the system A variable in a site's local memory can be changed
Arora and Gouda’s Algorithm
Each site maintains three variables:• leader - the identity of the site believed to be the leader• parent - the identity of the next node in a path to the leader• dist - the distance to the leader, measured in number of
links
Algorithm Structure
This version of the algorithm assumes that a site's local variables cannot be corrupted
begin(our leader < self) or(we can’t communicate with parent)
our leader := self our parent := self
▯(parent’s leader our leader)
our leader := parent’s leader
▯(a neighbor’s leader > our leader)
our leader := neighbor’s leader our parent := neighbor
end
Simplified Algorithm
begin(leader.i < i) or(parent.ineighbor.i [i])
leader.i, parent.i := i, i
▯parent.i = j and j neighbor.iand leader.i leader.j
leader.i := leader.j
▯j neighbor.i and leader.i < leader.j
leader.i, parent.i := leader.j, jend
Example
Example (cont)
Formation of cycles
• The corruption of a site's local variables can produce a cycle in the parent graph
• The algorithm must be extended to automatically break cycles• Let K be an upper bound on the number of sites in the system
Complete Algorithm
begin(leader.i < i) or(parent.i = i and (leader.i i or dist.i 0)) or(parent.ineighbor.i [i]) or(dist.i K)
leader.i, parent.i, dist.i := i, i, 0
▯parent.i = j and j neighbor.i and dist.j < Kand (leader.i leader.j or dist.i dist.j+1)
leader.i, dist.i := leader.j. dist.j+1
▯leader.i < leader.j and j neighbor.i and dist.j < K
leader.i, parent.i, dist.i := leader.j, j, dist.j+1end
Fairness
• Minimal: If some program action is enabled, then some enabled action is executed
• Weak: If some program action is continuously enabled,then that program action is eventually executed
• Process: If some process actions are continuously enabled,then some enabled action of the process iseventually executed
• Strong: If some program action is infinitely often enabled,then that program action is infinitely often executed
Hyperfairness, extreme fairness, …Reference: “Fairness”, by Nissim Francez, Springer Verlag 1986
Fairness
Theorem: The Arora-Gouda protocol is correct under minimal fairness
Corollary: The Arora-Gouda protocol is correct under weak fairness, process fairness, …
1. Fake Leader values disappear:
Fake leader values of minimum distance “disappear”:
1. These values are non-decreasing
2. These values eventually increase
3. K is an upper bound for these values
Fairness
2. Process with highest priority elects itself as leader, by executing its first action: Let the highest priority up process be k Unless leader.k=k dist.k=0 parent.k=k holds, by (1),(2),(3)
the leader value k will disappear, and leader.k<k will be continuously enabled until the first action of k is executed
3. By induction on d – the distance of a process from process k – argue that all processes at distance d will eventually “correctly join” the tree routed at k: Assuming that the tree till depth d-1 is correctly formed, the
second or the third action of a process at distance d is continuously enabled unless the process correctly joins the tree