leader election
Post on 24-Feb-2016
71 Views
Preview:
DESCRIPTION
TRANSCRIPT
LEADER ELECTION
CS 271 1
Election Algorithms• Many distributed algorithms need one process to
act as coordinator– Doesn’t matter which process does the job, just need
to pick one• Election algorithms: technique to pick a unique
coordinator (aka leader election)• Types of election algorithms: Bully and Ring
algorithms
CS 271 2
Bully Algorithm• Each process has a unique numerical ID• Processes know Ids and address of all other
process• Communication is assumed reliable• Key Idea: select process with highest ID• Process initiates election if it just recovered from
failure or if coordinator failed• 3 message types: election, OK, I won• Processes can initiate elections simultaneously
– Need consistent result
CS 271 3
Bully Algorithm Details• Any process P can initiate an election• P sends Election messages to all process with
higher Ids and awaits OK messages• If no OK messages, P becomes coordinator &
sends I won to all process with lower Ids• If it receives OK, it drops out & waits for I won• If a process receives Election msg, it returns OK
and starts an election• If a process receives I won then sender is
coordinator
CS 271 4
Bully Algorithm Example
a) Process 4 holds an electionb) Process 5 and 6 respond, telling 4 to stopc) Now 5 and 6 each hold an election
CS 271 5
Bully Algorithm Example
d) Process 6 tells 5 to stope) Process 6 wins and tells everyone
CS 271 6
Simple Ring-based Election• Processes have unique Ids and arranged in a logical ring• Each process knows its neighbors • Select process with highest ID as leader• Begin election if just recovered or coordinator has failed• Send Election to closest downstream node that is alive
– Sequentially poll each successor until a live node is found• Each process tags its ID on the message• Initiator picks node with highest ID and sends a coordinator
message• Multiple elections can be in progress—no harm.
CS 271 7
Ring Algorithm Example
CS 271
8
Ring Algorithm Example
CS 271
9
Comparison
• Assume n processes and one election in progress
• Bully algorithm– Worst case: initiator is node with lowest ID
• Triggers n-2 elections at higher ranked nodes: O(n2) msgs
– Best case: immediate election: n-2 messages• Ring
– 2 (n-1) messages always
CS 271 10
Highlights of Leader Election
• Basic idea: each process has a unique process-id.
• Once leader is discovered died, elect process with highest (lowest) process-id.
CS 271 11
BROADCAST PROTOCOLS
CS 271 12
Broadcast Protocols
• Why Broadcast protocols?– Data replication– Highly available servers– Cluster management– Distributed logging– ……
• Sometimes, message is received, but delivered later to satisfy some order requirements.
CS 271 13
Ordering properties: FIFO(Cornell)• Fifo or sender ordered multicast: fbcast
Messages are delivered in the order they were sent (by any single sender)
p
q
r
s
a e
CS 271 14
Ordering properties: FIFO
p
q
r
s
a
b c d
e
delivery of c to p is delayed until after b is delivered
CS 271 15
Limitations of FIFO Broadcast
Scenario:• User A broadcasts a message to a mailing list• B delivers that message• B broadcasts reply• C delivers B’s response without A´s original
message• and misinterprets the message
CS 271 16
Ordering properties: Causal• Causal or happens-before ordering: cbcast
If send(a) send(b) then deliver(a) occurs before deliver(b) at common destinations
p
q
r
s
a
b
CS 271 17
Ordering properties: Causal
p
q
r
s
a
b cdelivery of c to p is delayed until after b is delivered
CS 271 18
Ordering properties: Causal
p
q
r
s
a
b c
e
delivery of c to p is delayed until after b is deliverede is sent (causally) after b
CS 271 19
Ordering properties: Causal
p
q
r
s
a
b c d
e
delivery of c to p is delayed until after b is delivereddelivery of e to r is delayed until after b&c are delivered
CS 271 20
Limitation of Causal Broadcast
Causal broadcast does not impose any order on unrelated messages.
Two replicas can deliver operations/request in different order.
CS 271 21
Ordering properties: Total• Total or locally total multicast: atomic bcast
Messages are delivered in same order to all recipients (including the sender)
p
q
r
s
a
b c d
e
all deliver a, b, c, d, then e
CS 271 22
Simple Causal broadcast protocol
• Each broadcast message carries all causally preceding messages
• Before delivery, ensure causality by delivering any missed causally preceding messages.
CS 271 23
Isis Causal Broadcast
• Each process maintains a time vector of size n.• Initially VT[i] = 0.• When p sends a new message m: VT[p]++• Each message is piggybacked with VTm which
is the current VT of the sender.• When p delivers a message, p updates its
vector: for k in 1..n:– VTp[k] = max{ VTp[k], VTm[k] }.
CS 271 24
Isis Causal Order
• Requirement for delivery at node j:– VTsender[sender] = VTreceiver[sender]+1
• This is the next message from sender
– VTsender[k] =< VTreceiver[k] for all k not sender• Receiver has received all causally preceding messages
sender recei
ver
VTsender VTreceiver
CS 271 25
Total order
• Different classes of total order broadcast:– Fixed sequencer – Moving sequencer using Token– Dstributed agreement using Timestamp
CS 271 26
Using Sequencer (Amoeba)• Delivery algorithm similar to FIFO except for using
a special “sequencer” to order messages• Sender attaches unique id i to each message m
and sends <m,i> to the sequencer as well as to all destinations
• Sequencer maintains sequence number S (consecutive and increasing) and broadcast <i, S> to all destinations.
• Message(k) is delivered – if all messages(j) (0 j < k) are received
CS 271 27
Distributed Total Order Protocol (ISIS)• Processes collectively agree on sequence
numbers (priority) in three rounds• Sender sends message <m, id> to all receivers;• Receivers suggest priority (sequence number) and
reply to sender with proposed priority;• Sender collects all proposed priorities; decides on
final priority (breaking ties with process ids), and resends the agreed final priority for message m
• Receivers deliver message m according to decided final priority
CS 271 28
ISIS algorithm for total ordering
2
1
1
2
2
1 Message
2 Proposed Seq
P2
P3
P1
P4
3 Agreed Seq
3
3
Group g: P1, P2, P3, P4
CS 271 29
top related