byzantine techniques ii presenter: georgios piliouras partly based on slides by justin w. hart &...
Post on 22-Dec-2015
220 views
TRANSCRIPT
Byzantine Techniques II
Presenter: Georgios Piliouras
Partly based on slides by
Justin W. Hart & Rodrigo Rodrigues
Papers
• Practical Byzantine Fault Tolerance. Miguel Castro et. al. (OSDI 1999)
• BAR Fault Tolerance for Cooperative Services. Amitanand S. Aiyer, et. al. (SOSP 2005)
Motivation
• Computer systems provide crucial services
server
client
Problem
• Computer systems provide crucial services
• Computer systems fail– natural disasters
– hardware failures
– software errors
– malicious attacks
Need highly-available services
client
server
Replication
unreplicated service
client
server
Replication
replicated service
client
serverreplicas
unreplicated service
client
server
Replication algorithm: • masks a fraction of faulty replicas• high availability if replicas fail “independently”
Assumptions are a Problem
• Replication algorithms make assumptions:– behavior of faulty processes
– synchrony
– bound on number of faults
• Service fails if assumptions are invalid
Assumptions are a Problem
• Replication algorithms make assumptions:– behavior of faulty processes
– synchrony
– bound on number of faults
• Service fails if assumptions are invalid– attacker will work to invalidate assumptions
Most replication algorithms assume too much
Contributions
• Practical replication algorithm:– weak assumptions tolerates attacks– good performance
• Implementation– BFT: a generic replication toolkit– BFS: a replicated file system
• Performance evaluation
BFS is only 3% slower than a standard file system
Talk Overview
• Problem
• Assumptions
• Algorithm
• Implementation
• Performance
• Conclusions
Bad Assumption: Benign Faults
• Traditional replication assumes:– replicas fail by stopping or omitting steps
Bad Assumption: Benign Faults
• Traditional replication assumes:– replicas fail by stopping or omitting steps
• Invalid with malicious attacks:– compromised replica may behave arbitrarily– single fault may compromise service– decreased resiliency to malicious attacks
client
serverreplicas
attacker replacesreplica’s code
BFT Tolerates Byzantine Faults
• Byzantine fault tolerance: – no assumptions about faulty behavior
• Tolerates successful attacks– service available when hacker controls replicas
client
serverreplicas
attacker replacesreplica’s code
Byzantine-Faulty Clients
• Bad assumption: client faults are benign– clients easier to compromise than replicas
Byzantine-Faulty Clients
• Bad assumption: client faults are benign– clients easier to compromise than replicas
• BFT tolerates Byzantine-faulty clients:– access control– narrow interfaces– enforce invariants server
replicas
attacker replacesclient’s code
Support for complex service operations is important
Bad Assumption: Synchrony
• Synchrony known bounds on:
– delays between steps
– message delays
• Invalid with denial-of-service attacks:– bad replies due to increased delays
• Assumed by most Byzantine fault tolerance
Asynchrony
• No bounds on delays• Problem: replication is impossible
Asynchrony
• No bounds on delays• Problem: replication is impossible
Solution in BFT:• provide safety without synchrony
– guarantees no bad replies
Asynchrony
• No bounds on delays• Problem: replication is impossible
Solution in BFT:• provide safety without synchrony
– guarantees no bad replies
• assume eventual time bounds for liveness– may not reply with active denial-of-service attack– will reply when denial-of-service attack ends
Talk Overview
• Problem
• Assumptions
• Algorithm
• Implementation
• Performance
• Conclusions
Algorithm Properties• Arbitrary replicated service
– complex operations – mutable shared state
• Properties (safety and liveness):– system behaves as correct centralized service– clients eventually receive replies to requests
• Assumptions:– 3f+1 replicas to tolerate f Byzantine faults (optimal)
– strong cryptography– only for liveness: eventual time bounds
clients
replicas
State machine replication:– deterministic replicas start in same state– replicas execute same requests in same order– correct replicas produce identical replies
Algorithm
replicasclient
State machine replication:– deterministic replicas start in same state– replicas execute same requests in same order– correct replicas produce identical replies
Algorithm
replicasclient
Hard: ensure requests execute in same order
Ordering Requests
Primary-Backup:• View designates the primary replica
• Primary picks ordering• Backups ensure primary behaves correctly
– certify correct ordering– trigger view changes to replace faulty primary
view
replicasclientprimary backups
Rough Overview of Algorithm
• A client sends a request for a service to the primary
replicasclientprimary backups
Rough Overview of Algorithm
• A client sends a request for a service to the primary
• The primary mulicasts the request to the backups
replicasclientprimary backups
Rough Overview of Algorithm
• A client sends a request for a service to the primary
• The primary mulicasts the request to the backups• Replicas execute request and sent a reply to the
client
replicasclientprimary backups
Rough Overview of Algorithm• A client sends a request for a service to the
primary• The primary mulicasts the request to the backups• Replicas execute request and sent a reply to the
client• The client waits for f+1 replies from different
replicas with the same result; this is the result of the operation
view
replicasclientprimary backups
f+1 matching replies
Quorums and Certificates
3f+1 replicas
quorums have at least 2f+1 replicas
quorum A quorum B
quorums intersect in at least one correct replica
• Certificate set with messages from a quorum
• Algorithm steps are justified by certificates
Algorithm Components
• Normal case operation
• View changes
• Garbage collection
• Recovery
All have to be designed to work together
Normal Case Operation
• Three phase algorithm:– pre-prepare picks order of requests– prepare ensures order within views– commit ensures order across views
• Replicas remember messages in log
• Messages are authenticated • denotes a message sent by kk
Pre-prepare Phase
request : m
assign sequence number n to request m in view v
primary = replica 0
replica 1
replica 2
replica 3fail
multicast PRE-PREPARE,v,n,m0
backups accept pre-prepare if:• in view v• never accepted pre-prepare for v,n with different request
Prepare Phase
mpre-prepare prepare
replica 0
replica 1
replica 2
replica 3 fail
multicast PREPARE,v,n,D(m),11
digest of m
accepted PRE-PREPARE,v,n,m0
all collect pre-prepare and 2f matching prepares
P-certificate(m,v,n)
Order Within View
If it were false:
replicas
quorum forP-certificate(m’,v,n)
quorum forP-certificate(m,v,n)
one correct replica in common m = m’
No P-certificates with the same view and sequence number and different requests
Commit Phase
Request m executed after:• having C-certificate(m,v,n) • executing requests with sequence number less than n
replica has P-certificate(m,v,n)
mpre-prepare prepare
replica 0
replica 1
replica 2
replica 3fail
commit
multicast COMMIT,v,n,D(m),22
all collect 2f+1 matching commits
C-certificate(m,v,n)
replies
View Changes
• Provide liveness when primary fails: – timeouts trigger view changes – select new primary ( view number mod 3f+1)
View Changes
• Provide liveness when primary fails: – timeouts trigger view changes – select new primary ( view number mod 3f+1)
• But also need to: – preserve safety– ensure replicas are in the same view long enough– prevent denial-of-service attacks
View Change Protocol
replica 0 = primary v
replica 1= primary v+1
replica 2
replica 3
fail
send P-certificates: VIEW-CHANGE,v+1,P,22
primary collects VC-messages in X: NEW-VIEW,v+1,X,O1
pre-prepares messages for v+1 view in O with the same sequence number
backups multicast prepare messages for pre-prepares in O
2f VC messages
View Change Safety
• Intuition: if replica has C-certificate(m,v,n) then
any quorum Qquorum forC-certificate(m,v,n)
correct replica in Q has P-certificate(m,v,n)
Goal: No C-certificates with the same sequence number and different requests
Garbage CollectionTruncate log with certificate: • periodically checkpoint state (K) • multicast CHECKPOINT,n,D(checkpoint),i• all collect 2f+1 checkpoint messages
send checkpoint in view-changes
i
Log
h H=h+2K
discard messages and checkpoints
reject messages
sequence numbers
S-certificate(h,checkpoint)
Formal Correctness Proofs
• Complete safety proof with I/O automata– invariants – simulation relations
•
Partial liveness proof with timed I/O automata
– invariants
Communication Optimizations
• Digest replies: send only one reply to client with result
Communication Optimizations
• Digest replies: send only one reply to client with result
• Optimistic execution: execute prepared requests
Read-write operations execute in two round-trips
client 2f+1 replies
Communication Optimizations
• Digest replies: send only one reply to client with result
• Optimistic execution: execute prepared requests
• Read-only operations: executed in current state
Read-only operations execute in one round-trip
client
Read-write operations execute in two round-trips
client 2f+1 replies
2f+1 replies
Talk Overview
• Problem
• Assumptions
• Algorithm
• Implementation
• Performance
• Conclusions
BFS: A Byzantine-Fault-Tolerant NFS
No synchronous writes – stability through replication
andrew benchmark
kernel NFS client
relay
replicationlibrary
snfsdreplication
library
kernel VM
snfsdreplication
library
kernel VM
replica 0
replica n
Talk Overview
• Problem
• Assumptions
• Algorithm
• Implementation
• Performance
• Conclusions
Andrew Benchmark
0
10
20
30
40
50
60
70
BFS BFS-nr
• BFS-nr is exactly like BFS but without replication• 30 times worse with digital signatures
Configuration
• 1 client, 4 replicas• Alpha 21064, 133 MHz• Ethernet 10 Mbit/s
Ela
ps
ed
tim
e (
sec
on
ds
)
BFS is Practical
0
10
20
30
40
50
60
70
BFS NFS
• NFS is the Digital Unix NFS V2 implementation
Configuration
• 1 client, 4 replicas• Alpha 21064, 133 MHz• Ethernet 10 Mbit/s• Andrew benchmark
Ela
ps
ed
tim
e (
sec
on
ds
)
BFS is Practical 7 Years Later
050
100150200250300350400450500
BFS BFS-nr NFS
• NFS is the Linux 2.2.12 NFS V2 implementation
Configuration
• 1 client, 4 replicas• Pentium III, 600MHz• Ethernet 100 Mbit/s• 100x Andrew benchmark
Ela
ps
ed
tim
e (
sec
on
ds
)
Conclusions
Byzantine fault tolerance is practical:– Good performance
– Weak assumptions improved resiliency
What happens if we go MAD?
What happens if we go MAD?
• Several useful cooperative services span Multiple Administrative Domains.– Internet routing– File distribution– Cooperative backup e.t.c.
• Dealing only with Byzantine behaviors is not enough.
Why?
• Nodes are under control of multiple administrators
Why?
• Nodes are under control of multiple administrators
• Broken – Byzantine behaviors. – Misconfigured, or configured with malicious
intent.
Why?
• Nodes are under control of multiple administrators
• Broken – Byzantine behaviors. – Misconfigured, or configured with malicious
intent.
• Selfish – Rational behaviors– Alter the protocol to increase local utility
Talk Overview
• Problem
• Model
• 3 Level Architecture
• Performance
• Conclusions
It is time to raise the BAR
It is time to raise the BAR
• Byzantine– Behaving arbitrarily or maliciously
• Altruistic– Execute the proposed program, whether it
benefits them or not
• Rational– Deviate from the proposed program for
purposes of local benefit
Protocols
• Incentive-Compatible Byzantine Fault Tolerant (IC-BFT)– It is in the best interest of rational nodes to follow
the protocol exactly
Protocols
• Incentive-Compatible Byzantine Fault Tolerant (IC-BFT)– It is in the best interest of rational nodes to follow
the protocol exactly
• Byzantine Altruistic Rational Tolerant (BART)– Guarantees a set of safety and liveliness
properties despite the presence of rational nodes
• IC-BFT BART
General idea
• Extend/Modify the Practical Byzantine Fault Tolerance Model in a way that combats the negative effects of rational (greedy) behavior.
• We will achieve that by using game-theoretic tools.
+
A taste of Nash Equilibrium
X_X,X_X
-100,-100
+1,-1Go Straight
-1,+10, 0Swerve
Go StraightSwerve
Naughty nodes are punished
• Nodes require access to a state machine in order to complete their objectives
• Protocol contains methods for punishing rational nodes, including denying them access to the state machine
Talk Overview
• Problem
• BAR Model
• 3 Level Architecture
• Performance
• Conclusions
Three-Level Architecture
• Layered design• simplifies analysis/construction of systems• isolates classes of misbehavior at
appropriate levels of abstraction
Level 1 – Basic Primitives
• Goals:– Provide IC-BFT versions of key
abstractions – Ensure long-term benefit to participants– Limit non-determinism– Mitigate the efffects of residual non-
determinism– Enforce predictable communication
patterns
Level 1 – Basic Primitives
• Goals:– Provide IC-BFT versions of key
abstractions – Ensure long-term benefit to participants– Limit non-determinism– Mitigate the efffects of residual non-
determinism– Enforce predictable communication
patterns
Level 1 – Basic Primitives
BART-RSM based on PBFT
Differences: use TRB instead of consensus
3f+2 nodes required for f faulty
Level 1 – Basic Primitives
• Goals:– Provide IC-BFT versions of key
abstractions – Ensure long-term benefit to participants– Limit non-determinism– Mitigate the efffects of residual non-
determinism– Enforce predictable communication
patterns
Level 1 – Basic Primitives
• The RSM rotates the leadership role to participants.
• Participants want to stay in the system in order to control the RSM and complete their protocols
• Ultimately, incentives stem from the higher level service
Level 1 – Basic Primitives
• Goals:– Provide IC-BFT versions of key
abstractions – Ensure long-term benefit to participants– Limit non-determinism– Mitigate the efffects of residual non-
determinism– Enforce predictable communication
patterns
Level 1 – Basic Primitives
• Self interested nodes could hide behind non-determinism to shirk work.– Tit-for-Tat policy– Communicate proofs of misconducts, leads to global
punishment
• Use Terminating Reliable Broadcast, rather than consensus.– In TRB, only the sender can propose a value– Other nodes can only adopt this value, or choose a
default value
Level 1 – Basic Primitives
• Goals:– Provide IC-BFT versions of key
abstractions – Ensure long-term benefit to participants– Limit non-determinism– Mitigate the efffects of residual non-
determinism– Enforce predictable communication
patterns
Level 1 – Basic Primitives
• Balance costs– No incentive to make the wrong choice
• Encourage timeliness– By allowing nodes to judge unilaterally
whether other nodes’ messages are late and inflict sanctions to them (Penance)
Level 1 – Basic Primitives
• Goals:– Provide IC-BFT versions of key
abstractions – Ensure long-term benefit to participants– Limit non-determinism– Mitigate the efffects of residual non-
determinism– Enforce predictable communication
patterns
Level 1 – Basic Primitives
• Nodes have to have participated at every step in order to have the opportunity to issue a command
• Message queues
x y
x x x xyyy
I am waiting message from x
Level 1 – Basic Primitives
• Nodes have to have participated at every step in order to have the opportunity to issue a command
• Message queues
x y
x x x xyyy
message
Level 1 – Basic Primitives
• Nodes have to have participated at every step in order to have the opportunity to issue a command
• Message queues
x y
xx xyyy
I am waiting message from y
y
Theorems
• Theorem 1: The TRB protocol satisfies Termination, Agreement, Integrity and Non-Triviality
• Theorem 2: No node has a unilateral incentive to deviate from the protocol
Level 2
• State machine replication is sufficient to support a backup service, but the overhead is unacceptable– 100 participants… 100 MB backed up… 10
GB of drive space
• Assign work to individual nodes, using arithmetic codes to provide low-overhead fault-tolerant storage
Guaranteed Response
• Direct communication is insufficient when nodes can behave rationally
• We introduce a “witness” that overhears the conversation
• This eliminates ambiguity
• Messages are routed through this intermediary
Guaranteed Response
Guaranteed Response
• Node A sends a request to Node B through the witness
• The witness stores the request, and enters RequestReceived state
• Node B sends a response to Node A through the witness
• The witness stores the response, and enters ResponseReceived
Guaranteed Response
• Deviation from this protocol will cause the witness to either notice the timeout from Node B or lying on the part of Node A
Optimization through Credible Threats
Optimization through Credible Threats
• Returns to game theory• Protocol is optimized so nodes can
communicate directly. Add a fast path• If recipient does not respond, nodes proceed
to the unoptimized case• Analogous to a game of chicken
Periodic Work Protocol
• Witness checks that periodic tasks, such as system maintenance are performed
• It is expected that, with a certain frequency, each node in the system will perform such a task
• Failure to perform one will generate a POM from the witness
Authoritative Time Service
• Maintains authoritative time
• Binds messages sent to that time
• Guaranteed response protocol relies on this for generating NoResponses
Authoritative Time Service
• Each submission to the state machine contains the timestamp of the proposer
• Timestamp is taken to be the maximum of the median of timestamps of the previous f+1 decisions
• If “no decision” is decided, then the timestamp is the previous authoritative time
Level 3 BAR-B
• BAR-B is a cooperative backup system
• Three operations– Store– Retrieve– Audit
Storage
• Nodes break files up into chunks
• Chunks are encrypted
• Chunks are stored on remote nodes
• Remote nodes send signed receipts and store StoreInfos
Retrieval
• A node storing a chunk can respond to a request for a chunk with– The chunk– A demonstration that the chunk’s lease has
expired– A more recent StoreInfo
Auditing
• Receipts constitute audit records
• Nodes will exchange receipts in order to verify compliance with storage quotas
Talk Overview
• Problem
• BAR Model
• 3 Level Architecture
• Performance
• Conclusions
Evaluation
• Performance is inferior to protocols that do note make these guarantees, but acceptable (?)
Impact of additional nodes
Impact of rotating leadership
Impact of fast path optimization
Talk Overview
• Problem
• BAR Model
• 3 Level Architecture
• Performance
• Conclusions
Conclusions
• More useful as a proof of concept but certainly explores a very interesting common ground between systems and game theory as a way of exploring the performance of real-life systems.
Conclusions
• More useful as a proof of concept but certainly explores a very interesting common ground between systems and game theory as a way of exploring the performance of real-life systems.
• CS 614 is over