byzantine techniques ii presenter: georgios piliouras partly based on slides by justin w. hart &...

Byzantine Techniques II

Presenter: Georgios Piliouras

Partly based on slides by

Justin W. Hart & Rodrigo Rodrigues

Papers

• Practical Byzantine Fault Tolerance. Miguel Castro et. al. (OSDI 1999)

• BAR Fault Tolerance for Cooperative Services. Amitanand S. Aiyer, et. al. (SOSP 2005)

http://www.cs.cornell.edu/courses/cs614/2005fa/papers/BAR.pdf




Motivation

• Computer systems provide crucial services

server

client

Problem

• Computer systems provide crucial services

• Computer systems fail– natural disasters

– hardware failures

– software errors

– malicious attacks

Need highly-available services

client

server

Replication

unreplicated service

client

server

Replication

replicated service

client

serverreplicas

unreplicated service

client

server

Replication algorithm: • masks a fraction of faulty replicas• high availability if replicas fail “independently”

Assumptions are a Problem

• Replication algorithms make assumptions:– behavior of faulty processes

– synchrony

– bound on number of faults

• Service fails if assumptions are invalid

Assumptions are a Problem

• Replication algorithms make assumptions:– behavior of faulty processes

– synchrony

– bound on number of faults

• Service fails if assumptions are invalid– attacker will work to invalidate assumptions

Most replication algorithms assume too much

Contributions

• Practical replication algorithm:– weak assumptions tolerates attacks– good performance

• Implementation– BFT: a generic replication toolkit– BFS: a replicated file system

• Performance evaluation

BFS is only 3% slower than a standard file system

Talk Overview

• Problem

• Assumptions

• Algorithm

• Implementation

• Performance

• Conclusions

Bad Assumption: Benign Faults

• Traditional replication assumes:– replicas fail by stopping or omitting steps

Bad Assumption: Benign Faults

• Traditional replication assumes:– replicas fail by stopping or omitting steps

• Invalid with malicious attacks:– compromised replica may behave arbitrarily– single fault may compromise service– decreased resiliency to malicious attacks

client

serverreplicas

attacker replacesreplica’s code

BFT Tolerates Byzantine Faults

• Byzantine fault tolerance: – no assumptions about faulty behavior

• Tolerates successful attacks– service available when hacker controls replicas

client

serverreplicas

attacker replacesreplica’s code

Byzantine-Faulty Clients

• Bad assumption: client faults are benign– clients easier to compromise than replicas

Byzantine-Faulty Clients

• Bad assumption: client faults are benign– clients easier to compromise than replicas

• BFT tolerates Byzantine-faulty clients:– access control– narrow interfaces– enforce invariants server

replicas

attacker replacesclient’s code

Support for complex service operations is important

Bad Assumption: Synchrony

• Synchrony known bounds on:

– delays between steps

– message delays

• Invalid with denial-of-service attacks:– bad replies due to increased delays

• Assumed by most Byzantine fault tolerance

Asynchrony

• No bounds on delays• Problem: replication is impossible

Asynchrony


Solution in BFT:• provide safety without synchrony

– guarantees no bad replies

Asynchrony


Solution in BFT:• provide safety without synchrony

– guarantees no bad replies

• assume eventual time bounds for liveness– may not reply with active denial-of-service attack– will reply when denial-of-service attack ends

Talk Overview

• Problem

• Assumptions

• Algorithm

• Implementation

• Performance

• Conclusions

Algorithm Properties• Arbitrary replicated service

– complex operations – mutable shared state

• Properties (safety and liveness):– system behaves as correct centralized service– clients eventually receive replies to requests

• Assumptions:– 3f+1 replicas to tolerate f Byzantine faults (optimal)

– strong cryptography– only for liveness: eventual time bounds

clients

replicas

State machine replication:– deterministic replicas start in same state– replicas execute same requests in same order– correct replicas produce identical replies

Algorithm

replicasclient

State machine replication:– deterministic replicas start in same state– replicas execute same requests in same order– correct replicas produce identical replies

Algorithm

replicasclient

Hard: ensure requests execute in same order

Ordering Requests

Primary-Backup:• View designates the primary replica

• Primary picks ordering• Backups ensure primary behaves correctly

– certify correct ordering– trigger view changes to replace faulty primary

view

replicasclientprimary backups

Rough Overview of Algorithm

• A client sends a request for a service to the primary




• The primary mulicasts the request to the backups




• The primary mulicasts the request to the backups• Replicas execute request and sent a reply to the

client


Rough Overview of Algorithm• A client sends a request for a service to the

primary• The primary mulicasts the request to the backups• Replicas execute request and sent a reply to the

client• The client waits for f+1 replies from different

replicas with the same result; this is the result of the operation

view


f+1 matching replies

Quorums and Certificates

3f+1 replicas

quorums have at least 2f+1 replicas

quorum A quorum B

quorums intersect in at least one correct replica

• Certificate set with messages from a quorum

• Algorithm steps are justified by certificates

Algorithm Components

• Normal case operation

• View changes

• Garbage collection

• Recovery

All have to be designed to work together

Normal Case Operation

• Three phase algorithm:– pre-prepare picks order of requests– prepare ensures order within views– commit ensures order across views

• Replicas remember messages in log

• Messages are authenticated • denotes a message sent by kk

Pre-prepare Phase

request : m

assign sequence number n to request m in view v

primary = replica 0

replica 1

replica 2

replica 3fail

multicast PRE-PREPARE,v,n,m0

backups accept pre-prepare if:• in view v• never accepted pre-prepare for v,n with different request

Prepare Phase

mpre-prepare prepare

replica 0

replica 1

replica 2

replica 3 fail

multicast PREPARE,v,n,D(m),11

digest of m

accepted PRE-PREPARE,v,n,m0

all collect pre-prepare and 2f matching prepares

P-certificate(m,v,n)

Order Within View

If it were false:

replicas

quorum forP-certificate(m’,v,n)

quorum forP-certificate(m,v,n)

one correct replica in common m = m’

No P-certificates with the same view and sequence number and different requests

Commit Phase

Request m executed after:• having C-certificate(m,v,n) • executing requests with sequence number less than n

replica has P-certificate(m,v,n)

mpre-prepare prepare

replica 0

replica 1

replica 2

replica 3fail

commit

multicast COMMIT,v,n,D(m),22

all collect 2f+1 matching commits

C-certificate(m,v,n)

replies

View Changes

• Provide liveness when primary fails: – timeouts trigger view changes – select new primary ( view number mod 3f+1)

View Changes

• Provide liveness when primary fails: – timeouts trigger view changes – select new primary ( view number mod 3f+1)

• But also need to: – preserve safety– ensure replicas are in the same view long enough– prevent denial-of-service attacks

View Change Protocol

replica 0 = primary v

replica 1= primary v+1

replica 2

replica 3

fail

send P-certificates: VIEW-CHANGE,v+1,P,22

primary collects VC-messages in X: NEW-VIEW,v+1,X,O1

pre-prepares messages for v+1 view in O with the same sequence number

backups multicast prepare messages for pre-prepares in O

2f VC messages

View Change Safety

• Intuition: if replica has C-certificate(m,v,n) then

any quorum Qquorum forC-certificate(m,v,n)

correct replica in Q has P-certificate(m,v,n)

Goal: No C-certificates with the same sequence number and different requests

Garbage CollectionTruncate log with certificate: • periodically checkpoint state (K) • multicast CHECKPOINT,n,D(checkpoint),i• all collect 2f+1 checkpoint messages

send checkpoint in view-changes

i

Log

h H=h+2K

discard messages and checkpoints

reject messages

sequence numbers

S-certificate(h,checkpoint)

Formal Correctness Proofs

• Complete safety proof with I/O automata– invariants – simulation relations

•

Partial liveness proof with timed I/O automata

– invariants

Communication Optimizations

• Digest replies: send only one reply to client with result



• Optimistic execution: execute prepared requests

Read-write operations execute in two round-trips

client 2f+1 replies



• Optimistic execution: execute prepared requests

• Read-only operations: executed in current state

Read-only operations execute in one round-trip

client

Read-write operations execute in two round-trips

client 2f+1 replies

2f+1 replies

Talk Overview

• Problem

• Assumptions

• Algorithm

• Implementation

• Performance

• Conclusions

BFS: A Byzantine-Fault-Tolerant NFS

No synchronous writes – stability through replication

andrew benchmark

kernel NFS client

relay

replicationlibrary

snfsdreplication

library

kernel VM

snfsdreplication

library

kernel VM

replica 0

replica n

Talk Overview

• Problem

• Assumptions

• Algorithm

• Implementation

• Performance

• Conclusions

Andrew Benchmark

0

10

20

30

40

50

60

70

BFS BFS-nr

• BFS-nr is exactly like BFS but without replication• 30 times worse with digital signatures

Configuration

• 1 client, 4 replicas• Alpha 21064, 133 MHz• Ethernet 10 Mbit/s

Ela

ps

ed

tim

e (

sec

on

ds

)

BFS is Practical

0

10

20

30

40

50

60

70

BFS NFS

• NFS is the Digital Unix NFS V2 implementation

Configuration

• 1 client, 4 replicas• Alpha 21064, 133 MHz• Ethernet 10 Mbit/s• Andrew benchmark

Ela

ps

ed

tim

e (

sec

on

ds

)

BFS is Practical 7 Years Later

050

100150200250300350400450500

BFS BFS-nr NFS

• NFS is the Linux 2.2.12 NFS V2 implementation

Configuration

• 1 client, 4 replicas• Pentium III, 600MHz• Ethernet 100 Mbit/s• 100x Andrew benchmark

Ela

ps

ed

tim

e (

sec

on

ds

)

Conclusions

Byzantine fault tolerance is practical:– Good performance

– Weak assumptions improved resiliency

What happens if we go MAD?

What happens if we go MAD?

• Several useful cooperative services span Multiple Administrative Domains.– Internet routing– File distribution– Cooperative backup e.t.c.

• Dealing only with Byzantine behaviors is not enough.

Why?

• Nodes are under control of multiple administrators

Why?


• Broken – Byzantine behaviors. – Misconfigured, or configured with malicious

intent.

Why?


• Broken – Byzantine behaviors. – Misconfigured, or configured with malicious

intent.

• Selfish – Rational behaviors– Alter the protocol to increase local utility

Talk Overview

• Problem

• Model

• 3 Level Architecture

• Performance

• Conclusions

It is time to raise the BAR

It is time to raise the BAR

• Byzantine– Behaving arbitrarily or maliciously

• Altruistic– Execute the proposed program, whether it

benefits them or not

• Rational– Deviate from the proposed program for

purposes of local benefit

Protocols

• Incentive-Compatible Byzantine Fault Tolerant (IC-BFT)– It is in the best interest of rational nodes to follow

the protocol exactly

Protocols

• Incentive-Compatible Byzantine Fault Tolerant (IC-BFT)– It is in the best interest of rational nodes to follow

the protocol exactly

• Byzantine Altruistic Rational Tolerant (BART)– Guarantees a set of safety and liveliness

properties despite the presence of rational nodes

• IC-BFT BART

General idea

• Extend/Modify the Practical Byzantine Fault Tolerance Model in a way that combats the negative effects of rational (greedy) behavior.

• We will achieve that by using game-theoretic tools.

+

http://images.google.com/imgres?imgurl=http://ocw.mit.edu/NR/rdonlyres/Global/2/2BDCF7AB-1F3A-42C3-A57A-99B7678E2FC5/0/chp_chess_game.jpg&imgrefurl=http://ocw.mit.edu/OcwWeb/Sloan-School-of-Management/15-040Spring2004/CourseHome/index.htm&h=300&w=400&sz=39&hl=en&sig2=xn0BFglliSFSEAadZtsFyA&start=2&tbnid=j65H7EP9WlHS7M:&tbnh=93&tbnw=124&ei=g0FuRdH_OLeiaL7LvO4D&prev=/images%3Fq%3Dgame%2Btheory%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DG

A taste of Nash Equilibrium

X_X,X_X

-100,-100

+1,-1Go Straight

-1,+10, 0Swerve

Go StraightSwerve

Naughty nodes are punished

• Nodes require access to a state machine in order to complete their objectives

• Protocol contains methods for punishing rational nodes, including denying them access to the state machine

Talk Overview

• Problem

• BAR Model


• Performance

• Conclusions

Three-Level Architecture

• Layered design• simplifies analysis/construction of systems• isolates classes of misbehavior at

appropriate levels of abstraction

Level 1 – Basic Primitives

• Goals:– Provide IC-BFT versions of key

abstractions – Ensure long-term benefit to participants– Limit non-determinism– Mitigate the efffects of residual non-

determinism– Enforce predictable communication

patterns


BART-RSM based on PBFT

Differences: use TRB instead of consensus

3f+2 nodes required for f faulty





patterns


• The RSM rotates the leadership role to participants.

• Participants want to stay in the system in order to control the RSM and complete their protocols

• Ultimately, incentives stem from the higher level service





patterns


• Self interested nodes could hide behind non-determinism to shirk work.– Tit-for-Tat policy– Communicate proofs of misconducts, leads to global

punishment

• Use Terminating Reliable Broadcast, rather than consensus.– In TRB, only the sender can propose a value– Other nodes can only adopt this value, or choose a

default value





patterns


• Balance costs– No incentive to make the wrong choice

• Encourage timeliness– By allowing nodes to judge unilaterally

whether other nodes’ messages are late and inflict sanctions to them (Penance)





patterns


• Nodes have to have participated at every step in order to have the opportunity to issue a command

• Message queues

x y

x x x xyyy

I am waiting message from x



• Message queues

x y

x x x xyyy

message



• Message queues

x y

xx xyyy

I am waiting message from y

y

Theorems

• Theorem 1: The TRB protocol satisfies Termination, Agreement, Integrity and Non-Triviality

• Theorem 2: No node has a unilateral incentive to deviate from the protocol

Level 2

• State machine replication is sufficient to support a backup service, but the overhead is unacceptable– 100 participants… 100 MB backed up… 10

GB of drive space

• Assign work to individual nodes, using arithmetic codes to provide low-overhead fault-tolerant storage

Guaranteed Response

• Direct communication is insufficient when nodes can behave rationally

• We introduce a “witness” that overhears the conversation

• This eliminates ambiguity

• Messages are routed through this intermediary

Guaranteed Response

Guaranteed Response

• Node A sends a request to Node B through the witness

• The witness stores the request, and enters RequestReceived state

• Node B sends a response to Node A through the witness

• The witness stores the response, and enters ResponseReceived

Guaranteed Response

• Deviation from this protocol will cause the witness to either notice the timeout from Node B or lying on the part of Node A

Optimization through Credible Threats

Optimization through Credible Threats

• Returns to game theory• Protocol is optimized so nodes can

communicate directly. Add a fast path• If recipient does not respond, nodes proceed

to the unoptimized case• Analogous to a game of chicken

Periodic Work Protocol

• Witness checks that periodic tasks, such as system maintenance are performed

• It is expected that, with a certain frequency, each node in the system will perform such a task

• Failure to perform one will generate a POM from the witness

Authoritative Time Service

• Maintains authoritative time

• Binds messages sent to that time

• Guaranteed response protocol relies on this for generating NoResponses

Authoritative Time Service

• Each submission to the state machine contains the timestamp of the proposer

• Timestamp is taken to be the maximum of the median of timestamps of the previous f+1 decisions

• If “no decision” is decided, then the timestamp is the previous authoritative time

Level 3 BAR-B

• BAR-B is a cooperative backup system

• Three operations– Store– Retrieve– Audit

Storage

• Nodes break files up into chunks

• Chunks are encrypted

• Chunks are stored on remote nodes

• Remote nodes send signed receipts and store StoreInfos

Retrieval

• A node storing a chunk can respond to a request for a chunk with– The chunk– A demonstration that the chunk’s lease has

expired– A more recent StoreInfo

Auditing

• Receipts constitute audit records

• Nodes will exchange receipts in order to verify compliance with storage quotas

Talk Overview

• Problem

• BAR Model


• Performance

• Conclusions

Evaluation

• Performance is inferior to protocols that do note make these guarantees, but acceptable (?)

Impact of additional nodes

Impact of rotating leadership

Impact of fast path optimization

Talk Overview

• Problem

• BAR Model


• Performance

• Conclusions

Conclusions

• More useful as a proof of concept but certainly explores a very interesting common ground between systems and game theory as a way of exploring the performance of real-life systems.

Conclusions

• More useful as a proof of concept but certainly explores a very interesting common ground between systems and game theory as a way of exploring the performance of real-life systems.

• CS 614 is over

byzantine techniques ii presenter: georgios piliouras partly based on slides by justin w. hart &...

Documents

replicas code slide

invalid slide

impossible slide

important slide

replicas bft

cooperative services

client faults

service attacks