1 a modular approach to fault-tolerant broadcasts and related problems author: vassos hadzilacos and...
TRANSCRIPT
![Page 1: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/1.jpg)
1
A Modular Approach to Fault-Tolerant Broadcasts
and Related Problems
Author: Vassos Hadzilacos and Sam Toueg
Distributed Systems: 526 U1580Professor: Ching-Chi Hsu
ftp://ftp.db.toronto.edu/pub/vassos/fault.tolerant.broadcasts.dvi.Z
![Page 2: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/2.jpg)
2
Overview
An earlier version appears in “Fault-Tolerant Broadcasts and Related Problems”, in chapter 5 of “Distributed Systems”, edited by Sape Mullender, Addison-Wesley Publishing Co., 1993
Introduction and Preliminaries Broadcast Specifications Broadcast Algorithms Consensus Terminating Reliable Broadcast Multicast Specifications
![Page 3: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/3.jpg)
3
Introduction
The communication primitives available are too weak, e.g., no reliable broadcast primitive
Fault-tolerant broadcasts are communication primitives that facilitate the development of fault-tolerant applications
Another paradigm: Consensus The literature is not coherent Primary goal of this paper: develop material of fault-
tolerant broadcasts and consensus in a coherent way
![Page 4: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/4.jpg)
4
Preliminaries
Focus on message-passing models only The chief characteristics of a message-passing model: the
type of communication network, the model of process and communication failures and the synchrony of the system
Types of communication Networks point-to-point and broadcast channel
Many of the results in this paper are independent of the type of communication networks
When needed, only point-to-point network is considered Point-to-point networks
communication primitives: send and receive
![Page 5: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/5.jpg)
5
Preliminaries
Outgoing message buffer and incoming message buffer Every process executes an infinite sequence of steps
Failure types Process failures
Crash failure send-omission failure receive-omission failure arbitrary(Byzantine or malicious) failure
Link failure omission failure
![Page 6: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/6.jpg)
6
Preliminaries
Synchronous and Asynchronous Networks A point-to-point network is synchronous if:
There is a know upper bound to execute a step Local clocks has known bounded rate of drift with respect to real
time There is a known upper bound on message delay ( consists of the
time to send, transport and receive)
Asynchronous: no timing assumptions Clock and Performance Failures in Synchronous Networks
Clock failure of a process: clock drift rate exceed the bound Performance failure of a process: completion time of a step exceeds
the bound
![Page 7: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/7.jpg)
7
Preliminaries
Performance failure of a link: transport some message in more time than the bound
Classification of Failures and Terminology Omission failures: crash, send-omission, receive-omission failures
of process and link omission failures Timing failures: omission, clock and performance failures Benign failures: synonymous to omission failures in asynchronous
networks and to timing failures in synchronous networks
Causal Precedence Properties of clocks
Clock Monotonicity: the clock never decreases or skip values and for any time c, the clock eventually reaches c.
![Page 8: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/8.jpg)
8
Preliminaries
Logical clocks: for processes p and q, and any steps e and f that occur at p and q, if then Ce(e) < Cp(f)
Synchronized Clocks: clock value at real time t differ by at most a know constant
fe
![Page 9: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/9.jpg)
9
Broadcast Specification
Assume benign failures Reliable Broadcast: two primitives, broadcast and deliver Assume each message is attached with sender’s id and
message’s sequence number Specification of reliable broadcast
Validity: if a correct process broadcasts a message m, then it eventually delivers m
Agreement: if a correct process delivers a message m, then all correct processes eventually deliver m
Integrity: For any message m, every correct process delivers m at most once, and only if m was previously broadcast by sender(m)
If the sender of a message m is faulty, the specification
![Page 10: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/10.jpg)
10
Broadcast Specifications
Two possible outcomes: either m is delivered by all correct processes or by none.
FIFO Broadcast FIFO Order: If a process broadcasts a message m before it
broadcasts a message m’, then no correct process delivers m’ unless it has previously delivered m
Causal Broadcast Causal Order: If the broadcast of a message m causally
precedes the broadcast of a message m’, then no correct process delivers m’ unless it has previously delivered m
![Page 11: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/11.jpg)
11
Broadcast Specifications
Faulty specifications (from the literature) If the broadcast of m causally precedes the broadcast of m’, then
every correct process that delivers both messages must deliver m before m’
Messages that are causally related are delivered in the causal order
Local Order: If a process broadcasts a message m and a process delivers m before broadcasting m’, then no correct process delivers m’ unless it has previously delivered m
Theorem: Causal Order is equivalent to FIFO Order and Local Order
mm’
![Page 12: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/12.jpg)
12
Broadcast Specificatoins
Atomic Broadcast Total Order: If correct processes p and q both deliver
messages m and m’, then p delivers m before m’ if and only if q delivers m before m’
FIFO Atomic Broadcast Causal Atomic Broadcast Timed Broadcasts Elapsed time can be interpreted in two different ways: real
time or local time
![Page 13: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/13.jpg)
13
Broadcast Specifications
Real-Time Timeliness: There is a known constant such that if a message m is broadcast at real time t, then no correct process delivers m after real time t+
Assume each message m contains a timestamp ts(m) denoting the local time at which m was broadcast according to the sender’s clock
Local-Time -Timeliness: There is a known constant such that no correct process p delivers a message m after local time ts(m)+ on p’s clock
![Page 14: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/14.jpg)
14
Broadcast Specifications
Place restrictions on the messages delivered by faulty processes
Uniform Agreement: If a process (whether correct or faulty) delivers a message m, then all correct processes eventually deliver m
Uniform Integrity: For any message m, every process (whether correct or faulty) delivers m at most once, and only if m was previously broadcast by sender(m)
Uniform Real-time -Timeliness: There is a known constant such that if a message m is broadcast at real time t, then no process (whether correct or faulty) delivers m after real time t +
![Page 15: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/15.jpg)
15
Broadcast Specifications
Uniform Local-Time -Timeliness: There is a known constant such that no process p (whether correct or faulty) delivers a message m after local time ts(m)+ on p’s clock
Uniform FIFO Order, Uniform Local Order, Uniform Causal Order, Uniform Total Order
Broadcast Specifications for Arbitrary Failures
![Page 16: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/16.jpg)
16
Relationship Among Broadcast Primitives
ReliableBroadcast
AtomicBroadcast
FIFOBroadcast
FIFO AtomicBroadcast
Causal AtomicBroadcast
CausalBroadcast
Total Order
Total Order
Total Order
FIFO Order
Causal Order Causal Order
FIFO Order
![Page 17: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/17.jpg)
17
Inconsistency and Contamination
The traditional specifications of most broadcasts, including Uniform broadcasts, allow the inconsistency of faulty processes, and the subsequent contamination of correct processes
Example: Atomic Broadcast It is possible to prevent the inconsistency of faulty
processes, or at least the contamination of correct ones
![Page 18: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/18.jpg)
18
Amplification of Failures
Broadcast primitives are usually on top of communication primitives
A broadcast algorithm is likely to amplify the severity of failures that occur at the low level
Even if processes are only subject to crash failures, we cannot assume that the message deliveries that a process make before crashing are always correct.
Example: a coordinator based atomic broadcast algorithm. Even if a faulty process behaves correctly until it crashes, it may still deliver messages out-of-order before it crashes!
Crash failures by themselves do not guarantee reasonable behavior at the broadcast/delivery level
![Page 19: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/19.jpg)
19
Broadcast Algorithm I --Methodology
Start with any given Reliable Broadcast algorithm, and show how to achieve each one of these 3 order properties by a corresponding algorithmic transformation
3 transformations: one adds FIFO Order, one adds Causal order and one adds Total Order
None of the transformations require assumptions on the type or synchrony of the underlying network, and all of them work for any type and number of benign failures.
All transformations preserve Uniform Agreement and, under certain assumptions, both versions of -Timeliness
![Page 20: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/20.jpg)
20
Broadcast Algorithms II --Transformations
Achieving total order Achieving FIFO order Achieving causal order All transformations preserve Uniform Agreement and,
under some conditions, both versions of -Timeliness All transformations work for any type and number of
benign failures, and regardless of the type or synchrony of the network
All broadcasts consider here satisfy Uniform Integrity
![Page 21: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/21.jpg)
21
Achieving Total Order
A transform that can be used to transform a Reliable, FIFO or Causal Broadcast that satisfies Local-Time -Timeliness into its Atomic counterpart
This transformation preserves Validity, Agreement, Integrity, FIFO Order and Causal Order ( and their uniform counterparts)
![Page 22: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/22.jpg)
22
Preserving Total Order
Algorithm
To execute broadcast(BA, m)
broadcast(B, m)
deliver(BA, m)
upon deliver(B, m) do
schedule deliver(BA, m) at time ts(m)+
![Page 23: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/23.jpg)
23
Achieving FIFO Order
An algorithm that transforms any Reliable Broadcast algorithm into a FIFO Broadcast that satisfies Uniform FIFO Order.
Preserves (Uniform) Total Order Assume a sequence number is attached at every message
![Page 24: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/24.jpg)
24
Achieving Causal Order
Two algorithms to transform from FIFO Broadcast to Causal Broadcast, one is blocking and the other not
Both require that the given FIFO Broadcast algorithms satisfy Uniform FIFO Order
Non-Blocking Transformation: preserves Total Order, but not Uniform Total Order
If the given FIFO Broadcast satisfies Uniform Agreement, the transformation preserve both versions of -Timeliness
![Page 25: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/25.jpg)
25
Achieving Causal Order
![Page 26: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/26.jpg)
26
Achieving Causal Order
Blocking Transformation Advantage: uses shorter messages Uses vector timestamps Preserves (Uniform) Total Order
![Page 27: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/27.jpg)
27
Point-to-Point Networks
Model of Point-to-Point Networks Primitives send and receive satisfy: Validity: If p sends m to q, and both p and q and the link
from p to q are correct, then q eventually receives m. Uniform Integrity: For any message m, q receives m at most
once from p, and only if p previously sent m to q All Reliable Broadcast algorithms given here rely on two
assumptions Benign Failures: No Partitioning
![Page 28: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/28.jpg)
28
Reliable Broadcast
Algorithm
To execute broadcast(R,m)
send(m) to p
upon receive(m) do
if p has not previously executed deliver(R,m)
then
send(m) to all neighbors
deliver(R, m) The algorithm satisfies Validity, Agreement, and Uniform
Integrity
![Page 29: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/29.jpg)
29
Reliable Broadcast
Additional property of send and receive primitives Uniform FIFO Order: If p sends m to q before it sends m’ to
q, then q does not receive m’ unless it has previously received m
Theorem: If send and receive primitives satisfy Uniform FIFO Order, the Reliable Broadcast algorithm satisfies Uniform Causal Order
Additional property of send and receive primitives Strong Validity: If a process p ( whether correct or not)
completes the sending of a message m to a correct process q, and the link from p to q is correct, then q eventually receives m
![Page 30: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/30.jpg)
30
Reliable Broadcast
Theorem: Consider a network such that: (1) processes do not commit send-omission failures, and (2) every process p (whether correct or faulty) is connected to every correct process via a path consisting entirely of correct processes and links (with the possible exception of p itself). The Reliable Broadcast algorithm satisfies Uniform Agreement
Model of Synchronous Point-to-Point Networks
![Page 31: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/31.jpg)
31
Consensus
Two primitives: propose and decide The consensus problem requires that if each correct process
proposes a value then the following hold: Termination: Every correct process eventually decides exactly one
value Agreement: If a correct process decides v, then all correct processes
eventually decide v Integrity: If a correct process decides v, then v was previously
proposed by some process Agreement and Integrity can be strengthened to Uniformity
![Page 32: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/32.jpg)
32
Consensus
Relating Consensus and Atomic Broadcast Transforming Atomic Broadcast into Consensus
To execute propose(v)
broadcast(A, v)
upon deliver(A, v) do
if p has not previously executed deliver(A, -)
then decide(u)
![Page 33: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/33.jpg)
33
Consensus
Transforming Reliable Broadcast and Consensus to Atomic Broadcast
![Page 34: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/34.jpg)
34
Terminating Reliable Broadcast
With Reliable Broadcast processes have no knowledge of the impending broadcasts
Allow the delivery of a special message With TRB for sender s, s can broadcast any message and the following hold:
Termination: Every correct process eventually delivers exactly one message
Validity: If s is correct and broadcasts a message m, then it eventually delivers m
Agreement: If a correct process delivers a message m, then all correct processes eventually deliver m
Integrity: If a correct process delivers a message m then sender(m)=s. If then m was previously broadcast by s
ΜFs
FsMm
Fsm
![Page 35: 1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e3b5503460f94b2dd63/html5/thumbnails/35.jpg)
35
Terminating Reliable Broadcast
In some synchronous point-to-point networks, Consensus is equivalent to TRB
In asynchronous systems, the two problems are not equivalent