byzantine fault tolerance

31
Byzantine fault tolerance Jinyang Li Slides adapted from Liskov

Upload: buck

Post on 23-Jan-2016

56 views

Category:

Documents


0 download

DESCRIPTION

Byzantine fault tolerance. Jinyang Li Slides adapted from Liskov. What we ’ ve learnt so far: tolerate fail-stop failures. Traditional RSM tolerates benign failures Node crashes Network partitions A RSM w/ 2f+1 replicas can tolerate f simultaneous crashes. Byzantine faults. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Byzantine fault tolerance

Byzantine fault tolerance

Jinyang Li

Slides adapted from Liskov

Page 2: Byzantine fault tolerance

What we’ve learnt so far:tolerate fail-stop failures

• Traditional RSM tolerates benign failures– Node crashes– Network partitions

• A RSM w/ 2f+1 replicas can tolerate f simultaneous crashes

Page 3: Byzantine fault tolerance

Byzantine faults

• Nodes fail arbitrarily– Failed node performs incorrect computation– Failed nodes collude

• Causes: attacks, software/hardware errors• Examples:

– Client asks bank to deposit $100, a Byzantine bank server substracts $100 instead.

– Client asks file system to store f1=“aaa”. A Byzantine server returns f1=“bbb” to clients.

Page 4: Byzantine fault tolerance

Strawman defense• Clients sign inputs.• Clients verify computation based on signed inputs.• Example: C stores signed file f1=“aaa” with server.

C verifies that returned f1 is signed correctly.• Problems:

– Clients have to perform computation– Even for storage systems,

• Byzantine node can falsely report non-existence of a file• Byzantine node can return stale/correct computation

– E.g. Client stores signed f1=“aaa” and later stores signed f1=“bbb”, a Byzantine node can always return f1=“aaa”.

Page 5: Byzantine fault tolerance

PBFT ideas

• PBFT, “Practical Byzantine Fault Tolerance”, M. Castro and B. Liskov, SOSP 1999

• Replicate a service across many nodes– Assumption: only a small fraction of nodes are Byzantine– Rely on a super-majority of votes to decide on correct

computation.

• PBFT property: tolerates <=f failures using a RSM with 3f+1 replicas

Page 6: Byzantine fault tolerance

RSM Review: normal operationClient write W

View=1, seqno=25, WView=1, seqno=25, W

Wait for responses from all backups before applying write and replying to client

N1=Primary

N2=BackupN3=Backup

Page 7: Byzantine fault tolerance

RSM review: view change

N1=Primary

N2=Backup N3=Backup

Paxos propose viewno=2, <primary=N2,backup=N3>

Paxos propose viewno=2, <primary=N3,backup=N2>

Page 8: Byzantine fault tolerance

Why doesn’t traditional RSM work with Byzantine nodes?

• Malicious primary is bad– Send wrong result to clients– Send different ops to different replicas– Assign the same seqno to different requests!

• Cannot use Paxos for view change– Paxos uses a majority accept-quorum to tolerate f

benign faults out of 2f+1 nodes– With malicious interfering, Paxos does not ensure

agreement among honest nodes

Page 9: Byzantine fault tolerance

Paxos under Byzantine faults

Propose vid=1, myn=N0:1OK val=null

N0 N1

N2

nh=N0:1nh=N0:1

Propose vid=1, myn=N0:1OK val=null

Page 10: Byzantine fault tolerance

Paxos under Byzantine faults

accept vid=1, myn=N0:1, val=xyzOK

N0 N1

N2

nh=N0:1nh=N0:1

XN0 decides on

Vid1=xyz

Page 11: Byzantine fault tolerance

Paxos under Byzantine faults

prepare vid=1, myn=N1:1, val=abcOK val=null

N0 N1

N2

nh=N0:1nh=N0:1

XN0 decides on

Vid1=xyz

Page 12: Byzantine fault tolerance

Paxos under Byzantine faults

accept vid=1, myn=N1:1, val=abcOK

N0 N1

N2

nh=N1:1nh=N0:1

X

N1 decides onVid1=abc

N0 decides onVid1=xyz

Agreement conflict!

Page 13: Byzantine fault tolerance

PBFT main ideas

• Static configuration (same 3f+1 nodes across views)

• To deal with malicious primary– Use a 3-phase protocol to agree on

sequence number

• To deal with loss of agreement– Use a bigger quorum (2f+1 out of 3f+1

nodes)

• Need to authenticate communications

Page 14: Byzantine fault tolerance

1. State: …A2. State: …A

3. State: …A4. State: …

BFT requires a 2f+1 quorum out of 3f+1 nodes

Servers

Clients

write A

write A

X

wri

te Aw

rite A

For liveness, the quorum size must be at most N - f

Page 15: Byzantine fault tolerance

…A …A B …B …B

BFT Quorums

write B

write

B

X

wri

te B

write B

Servers

Clients

1. State: 2. State: 3. State: 4. State:

For correctness, any two quorums must intersect at leastone honest node: (N-f) + (N-f) - N >= f+1 N >= 3f+1

Page 16: Byzantine fault tolerance

PBFT Strategy

• Primary runs the protocol in the normal case

• Replicas watch the primary and do a view change if it fails

Page 17: Byzantine fault tolerance

Replica state• A replica id i (between 0 and N-1)

– Replica 0, replica 1, …

• A view number v#, initially 0• Primary is the replica with id

i = v# mod N

• A log of <op, seq#, status> entries– Status = pre-prepared or prepared or

committed

Page 18: Byzantine fault tolerance

Normal Case

• Client sends signed request to primary– or to all

Page 19: Byzantine fault tolerance

Normal Case• Primary sends pre-prepare message to all• Pre-prepare contains <v#,seq#,op>

– Records operation in log as pre-prepared

– Keep in mind that primary might be malicious• Send different seq# for the same op to different replicas• Use a duplicate seq# for op

Page 20: Byzantine fault tolerance

Normal Case• Replicas check the pre-prepare and if it is ok:

– Record operation in log as pre-prepared– Send prepare messages to all– Prepare contains <i,v#,seq#,op>

• All to all communication

Page 21: Byzantine fault tolerance

Normal Case:• Replicas wait for 2f+1 matching prepares

– Record operation in log as prepared– Send commit message to all– Commit contains <i,v#,seq#,op>

• What does this stage achieve:– All honest nodes that are prepared prepare the

same value

Page 22: Byzantine fault tolerance

Normal Case:

• Replicas wait for 2f+1 matching commits– Record operation in log as committed– Execute the operation– Send result to the client

Page 23: Byzantine fault tolerance

Normal Case

• Client waits for f+1 matching replies

Page 24: Byzantine fault tolerance

BFT

Client

Primary

Replica 2

Replica 3

Replica 4

Request Pre-Prepare Prepare Commit Reply

Wait for 2f+1 matching responses

Wait for f+1 matching responses

Page 25: Byzantine fault tolerance

BFT

Client

Primary

Replica 2

Replica 3

Replica 4

Request Pre-Prepare Prepare Commit Reply

• Commit point: when 2f+1 replicas have prepared

Page 26: Byzantine fault tolerance

View Change

• If primary is honest, can faulty replicas prevent progress?

• If primary is faulty, can honest replicas detect problems?

• Replicas watch the primary

• Request a view change w/ progress is stalled

Page 27: Byzantine fault tolerance

View Change Challenges

• Which primary is next? – New primary = v# mod N– Why not pick arbitary primary?

• Can any one replica initialize a view change?

• How to ensure requests executed by good nodes in old view are reflected in new view?

Page 28: Byzantine fault tolerance

View Change

• Each replica sends new primary VIEW-CHANGE request with 2f+1 prepares for recent ops

• New primary waits for 2f+1 VIEW-CHANGE

• New primary sends NEW-VIEW to all with– Complete set of VIEW-CHANGE messages– List of every op for which some VIEW-

CHANGE contained 2f+1 prepares

Page 29: Byzantine fault tolerance

Additional Optimizations

• State transfer

• Checkpoints (garbage collection of the log)

• Timing of view changes

Page 30: Byzantine fault tolerance

Possible improvements

• Lower latency for writes (4 messages)– Replicas respond at prepare– Client waits for 2f+1 matching responses

• Fast reads (one round trip)– Client sends to all; they respond

immediately– Client waits for 2f+1 matching responses

Page 31: Byzantine fault tolerance

Practical limitations of BFTs

• Expensive

• Protection is achieved only when <= f nodes fail– Assume independent machine failures

• Does not prevent many classes attacks:– Turn a machine into a botnet node– Steal SSNs from servers