1 chapter 12 consensus ( fault tolerance). 2 reliable systems distributed processing creates faster...

16
1 Chapter 12 Chapter 12 Consensus Consensus ( ( Fault Tolerance Fault Tolerance ) )

Upload: johnathan-robinson

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

1

Chapter 12Chapter 12

ConsensusConsensus((Fault ToleranceFault Tolerance))

Page 2: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

2

Reliable SystemsReliable Systems Distributed processing creates faster Distributed processing creates faster

systems by exploiting parallelism but also systems by exploiting parallelism but also improve reliability by replicatingimprove reliability by replicating a a computation in several processorscomputation in several processors

A reliable system can be:A reliable system can be: Fail-safeFail-safe if one or more failures do not cause if one or more failures do not cause

damage to the system or to its users damage to the system or to its users and/orand/or Fault-tolerantFault-tolerant if it continues to fulfill its if it continues to fulfill its

requirements even if there are one or more requirements even if there are one or more failuresfailures

Page 3: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

3

Typical ATypical Architectures for a rchitectures for a ReliableReliable SSystemystem

Page 4: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

4

The Problem StatementThe Problem Statement A group of Byzantine armies is surrounding an A group of Byzantine armies is surrounding an

enemy city. The balance of force is such that if all enemy city. The balance of force is such that if all armies attack together, they can capture the city; armies attack together, they can capture the city; otherwise, they must all retreat to avoid defeat. otherwise, they must all retreat to avoid defeat. The generals of the armies have reliable The generals of the armies have reliable messengers who successfully deliver any message messengers who successfully deliver any message sent from one general to another. However, some sent from one general to another. However, some of the generals may be of the generals may be traitorstraitors endeavoring to endeavoring to bring about the defeat of the Byzantine armies. bring about the defeat of the Byzantine armies.

Devise an algorithm so that all Devise an algorithm so that all loyal loyal generals generals come to a come to a consensusconsensus on a plan. on a plan.

The final decision should be almost the same as a The final decision should be almost the same as a majority vote of their initial choices; if the vote is majority vote of their initial choices; if the vote is tied.tied.

Page 5: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

5

The Problem Statement The Problem Statement (Cont.)(Cont.)

In distributed systems, the generals are the nodes In distributed systems, the generals are the nodes and the messengers model communication and the messengers model communication channelschannels

Generals may fail (being traitors), but the Generals may fail (being traitors), but the messengers are assumed to be reliablemessengers are assumed to be reliable

Models for node failures:Models for node failures: Crash failures: Crash failures: A traitor (failure node) simply stops A traitor (failure node) simply stops

sending messages at any arbitrary point during the sending messages at any arbitrary point during the execution of the algorithmexecution of the algorithm

Byzantine failures:Byzantine failures: A traitor can send arbitrary A traitor can send arbitrary messages, not just the messages required by the messages, not just the messages required by the algorithmalgorithm

Page 6: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

6

Consensus – One-round Consensus – One-round AlgorithmAlgorithm

The values for planType are The values for planType are AA for attack and for attack and RR for for retreatretreat

Each general chooses a Each general chooses a plan, sends its plan to the plan, sends its plan to the other generals and receives other generals and receives their planstheir plans

The final plan is the majority The final plan is the majority vote among all plans, both vote among all plans, both the general’s own plan and the general’s own plan and the plans received from the the plans received from the othersothers

Page 7: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

7

Messages Sent in a One-round Messages Sent in a One-round AlgorithmAlgorithm Generals Generals ZoeZoe and and LeoLeo are loyal, are loyal,

BasilBasil is a traitor is a traitor Basil and Zoe chooses to attack, Basil and Zoe chooses to attack,

Leo chooses to retreatLeo chooses to retreat Messages are exchanged, but Messages are exchanged, but

Basil has crashed after sending an Basil has crashed after sending an attack message to Leo. No attack message to Leo. No message is received by Zoe from message is received by Zoe from BasilBasil

Zoe decides to retreat (may have Zoe decides to retreat (may have chosen attack), Leo decides to chosen attack), Leo decides to attack by majority votingattack by majority voting

Basil has crashed, Zoe retreats – Basil has crashed, Zoe retreats – ties are resolved in favour of ties are resolved in favour of retreat – common sense, Leo retreat – common sense, Leo decides to attack; decides to attack; no consensusno consensus

If a general crashes, it can cause If a general crashes, it can cause the remaining loyal generals to the remaining loyal generals to fail to come to a consensus; fail to come to a consensus; no no consensusconsensus

Page 8: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

8

The Byzantine Generals The Byzantine Generals AlgorithmAlgorithm

In the one-round algorithm, the fact that In the one-round algorithm, the fact that certain generals been loyal is not considered. certain generals been loyal is not considered. Leo should somehow be able to Leo should somehow be able to attribute attribute more weight to the planmore weight to the plan received from loyal received from loyal Zoe than the traitor BasilZoe than the traitor Basil

In a distributed system an individual node In a distributed system an individual node can not know the identities of the traitors can not know the identities of the traitors directly; rather, it must ensure that the plan directly; rather, it must ensure that the plan of the traitors can not cause the loyal of the traitors can not cause the loyal generals to fail to reach consensusgenerals to fail to reach consensus

Page 9: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

9

Algorithm in BriefAlgorithm in Brief

The Byzantine Generals algorithm sends The Byzantine Generals algorithm sends messages twice:messages twice: In the In the first roundfirst round each general sends its own each general sends its own

planplan In the In the second roundsecond round each general sends what each general sends what

is received from other generals is received from other generals Loyal generals relay exactly what they Loyal generals relay exactly what they

received, so that if there are enough loyal received, so that if there are enough loyal generals, they can reach to a consensusgenerals, they can reach to a consensus

Page 10: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

10

First round sends First round sends plans and receives plans and receives plans. At the end plans. At the end each general has the each general has the plan of each generalplan of each general

In the second round, In the second round, these plans are send these plans are send to the other to the other generals (except generals (except himself) and himself) and received back againreceived back again

Byzantine Generals Byzantine Generals AlgorithmAlgorithm

Page 11: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

11

Two Loyal, One Traitor – Crash Two Loyal, One Traitor – Crash FailureFailure

Same scenario in the one-round Same scenario in the one-round algorithm, where Basil (traitor) crashes algorithm, where Basil (traitor) crashes after sending the first round message to after sending the first round message to Leo, but before sending to ZoeLeo, but before sending to Zoe

2’nd column is the first round plans (2’nd column is the first round plans (Zoe:Zoe: gets Leo’s plan and nothing from crashed gets Leo’s plan and nothing from crashed Basil, Basil, Leo:Leo: has all plans) has all plans)

3’rd and 4’th are the second round plans 3’rd and 4’th are the second round plans ((Zoe:Zoe: No plan from Basil - crashed, No plan from Basil - crashed, Basil’s Basil’s AA from Leo, Leo does not send its from Leo, Leo does not send its plan –R- again; plan –R- again; Leo:Leo: No plan from Basil, No plan from Basil, No plan from Zoe – sent in the first No plan from Zoe – sent in the first round)round)

Majority voting :Majority voting : Basil:Basil: crashed; crashed; Zoe:Zoe: Attack; Attack; Leo:Leo: Attack Attack

Two of the generals reached to a Two of the generals reached to a consensusconsensus

Page 12: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

12

Another ScenarioAnother Scenario

BasilBasil, the traitor, sends all , the traitor, sends all its first round messages its first round messages and reports to and reports to LeoLeo before before crashingcrashing

Second Round; Second Round; Leo:Leo: Basil Basil sends Zoe’s Attack plan, sends Zoe’s Attack plan, Zoe sends Basil’s Attack Zoe sends Basil’s Attack plan; plan; Zoe:Zoe: No plan from No plan from Basil, Attack from LeoBasil, Attack from Leo

Majorty voting: Both Majorty voting: Both decide to attackdecide to attack

Page 13: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

13

Byzantine Failures with Three Byzantine Failures with Three GeneralsGenerals

(One-round Algorithm)(One-round Algorithm) Basil, the traitor, Basil, the traitor,

sends a sends a retreatretreat message to message to ZoeZoe and and attackattack to to LeoLeo

One round One round algorithm fails – algorithm fails – no consensusno consensus – – like the crash like the crash failure casefailure case

Page 14: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

14

In the first round, Basil In the first round, Basil sends an A message to both sends an A message to both Zoe and LeoZoe and Leo

In the second round, he In the second round, he correctly reports to Zoe correctly reports to Zoe that Leo’s plan is R, but that Leo’s plan is R, but erroneously reports to Leo erroneously reports to Leo that Zoe’s plan is Rthat Zoe’s plan is R

Leo decides to retreat (ties Leo decides to retreat (ties are broken in favour of are broken in favour of retreat), Zoe decides to retreat), Zoe decides to attack – attack – no consensusno consensus again again

The algorithm is not The algorithm is not correct for three generals correct for three generals of whom one is a traitorof whom one is a traitor

Byzantine Failures with Three Byzantine Failures with Three GeneralsGenerals

(Two-round Algorithm)(Two-round Algorithm)

Page 15: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

15

Byzantine Failures with Four Byzantine Failures with Four GeneralsGenerals

John, Basil, Leo are loyal John, Basil, Leo are loyal generals; Zoe is the traitorgenerals; Zoe is the traitor

Zoe sends first-round messages Zoe sends first-round messages of of RR to Basil and Leo and to Basil and Leo and AA to to John. These messages are John. These messages are relayed correctly by loyal relayed correctly by loyal generals and Basil has the table generals and Basil has the table shown on the leftshown on the left

The final decision will be a 2-1 The final decision will be a 2-1 vote in favor of R for Zoe’s planvote in favor of R for Zoe’s plan

So, if the loyal generals choose So, if the loyal generals choose the same plan initially, the final the same plan initially, the final decision would be this plan, decision would be this plan, regardless of the actions of the regardless of the actions of the traitortraitor

Page 16: 1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve

16

ConsensusConsensus

Crash Failures:Crash Failures: consensus is reached in consensus is reached in t+1t+1 roundsrounds where where tt is number of traitors is number of traitors

Byzantine Failures:Byzantine Failures: If more then two-thirds of If more then two-thirds of the generals are loyal, there is a solution the generals are loyal, there is a solution regardless the messages issued by traitorous regardless the messages issued by traitorous generals. If one-third or more of the generals generals. If one-third or more of the generals are traitors then there is no solution. In the are traitors then there is no solution. In the case of one traitor, there is a solution for four case of one traitor, there is a solution for four generals and none for three.generals and none for three. That is, the total That is, the total number of number of generalsgenerals must be at least must be at least 3t+13t+1, , where where tt is the number of traitors is the number of traitors