ordering and consistent cuts presented by chi h. ho
Post on 21-Dec-2015
218 views
TRANSCRIPT
Ordering and Consistent Cuts
Presented by Chi H. Ho
Time, Clocks, and the Ordering of Events in a Distributed System
Leslie Lamport
Introduction
• 2000 PODC Influential Paper Award• Outline of the paper: not in presented order
– Partial and Total Orderings– Logical and Physical Clocks– Clock and Strong Clock Conditions– Synchronize Physical Clocks
• Beyond…
“Happened Before”
• a b: if– a and b are events in the same process and a
comes before b, or– a is the send event of some message, and b is
the receive event of the same message.
• Transitive:
(a b) & (b c) (a c)• Concurrent: (a b) & (b a).
Partial Ordering
Examples
• q5 p4
• q2 q3
• p1 r3
• q2 // p2
• q2 // p3
Partial Ordering
Logical Clock
• Clock Condition:
a,b:a b C(a) < C(b)
Partial Ordering Implementation
Logical Clock
• Implementation Rules:– IR1:
• Each process Pi increments Ci between any two successive events.
– IR2:• If event a is the sending of a message m by process Pi,
then the message contains a timestamp Tm = Ci(a).
• Upon receiving a message m, process Pj sets Cj greater than or equal to its present value and greater than Tm.
Partial Ordering Implementation
Examples
Partial Ordering Implementation
P0
P1
0
Examples
Partial Ordering Implementation
P0
P1
0
0
Examples
Partial Ordering Implementation
P0
P1
0
0
1
Examples
Partial Ordering Implementation
P0
P1
0
0
1
1
Examples
Partial Ordering Implementation
P0
P1
0
0
1
1 2
Examples
Partial Ordering Implementation
P0
P1
0
0
1
1 2 3
[3]
Examples
Partial Ordering Implementation
P0
P1
0
0
1
1 2 3
[3]
2
[2]
Examples
Partial Ordering Implementation
P0
P1
0
0
1
1 2 3
[3]
2
[2]
4
Examples
Partial Ordering Implementation
P0
P1
0
0
1
1 2 3
[3]
2
[2]
4
4
Extended “Happened Before”
• a => b: iff– Ci(a) < Cj(b), or
– (Ci(a) = Cj(b)) & (Pi ≺ Pj)
Total Ordering
Example Application
• Shared resource granting– Fixed number of processes– Single shared resource– Requirements:
I. Mutual Exclusive
II. Fair
III. Exhaustive
Total Ordering
Example Application
• Solution: Distributed algorithm• Model:
– Channels are FIFO– Each process maintains a process queue
• Algorithm– Request: broadcast Tm:Pi request resource– Release: broadcast Tm:Pi release resource– Receive request: enqueue– Receive release: dequeue– Resource granted (local decision): Pi
• Tm:Pi request resource w/ Tm min• Pi has received from every process a msg timestamped later than Tm
• Note: – Can be generalized to solve Replicated State Machine!
Total Ordering
Anomaly
Amazon.com[19]
Anomaly
Amazon.com[19]
Anomaly
Amazon.com[19] [7]
Anomaly
Amazon.com[19]
[7]
Anomaly
Amazon.com[19]
[7]
External event
Strong Clock Condition
• S = {events in the system}
• S = S ⋃ {relevant external events}
is “happened before” for S
∀a,b ∈ S:
a b C(a) < C(b)
Avoid Anomaly
Physical Clocks
• PC1: (drift rate bound)
∃ << 1 such that ∀i: |dCi(t)/dt – 1| <
• PC2: (drift bound)
i,j: |Ci(t) – Cj(t)| <
Avoid Anomaly
< shortest msg transmission time• ∀i,j,t:
Ci(t+) – Cj(t) > 0
Physical Clocks
/(1-)
Amazon.com
ji
Cj(t) > Ci(t+)
>
Implementation Rules
• IR1’: – For each i, if Pi does not receive a message at
physical time t, then Ci is differentiable at t and dCi(t)/dt > 0.
• IR2’:– (a) If Pi sends a message m at physical time t,
then m contains a timestamp Tm = Ci(t).– (b) Upon receiving a message m at time t’,
process Pj sets Cj(t’) equal to maximum (Cj(t’-0), Tm + m)
Physical Clocks
Synchronize Physical Clocks
Physical Clocks
• Problem statement:– IR1’ and IR2’ are followed,– Message delay is bounded,– Clocks satisfied PC1,– Goal: PC2
• Algorithm:– Every seconds, a message is sent over every arc.
• Guarantees:– Clocks are synchronized after t0 + d d(2 + )
Beyond…
• Shortcomings:– No gap-detection property– C(a) < C(b) ???– Bounds are not practical (So is PC!)
Gap Detection Property
• Problem statement:– Given: a, b, C(a), C(b), C(a) < C(b), – Determine if c exists, where
C(a) < C(c) < C(b)?
Beyond…
Another Strong Clock Condition
a b C(a) < C(b)
Beyond…
What clock, then?
• Causal histories:
Beyond…
• Vector Clocks:
More on Vector Clocks
Strong Clock ConditionConcurrentPair-wise InconsistentConsistent CutCountingGap Detection
Beyond…
More on Vector Clocks
Strong Clock ConditionConcurrentPair-wise InconsistentConsistent CutCountingGap Detection, but…
Beyond…
X Weak Gap-Detection
Given a, b, can detect existence of c such that
(c a) & (c b)
Reference
• O. Babaoglu and K. Marzullo. Consistent global states of distributed systems: Fundamental concepts and mechanisms. In Sape Mullender, editor, Distributed Systems, ch. 4, pages 55--96. Addison Wesley, 2nd ed., 1993. http://citeseer.ist.psu.edu/babaoglu93consistent.html
• Note: some materials in this paper are used to clarify a few concepts in the next paper.
Beyond…
Distributed Snapshots: Determining Global States of Distributed Systems
K. Mani Chandy
Leslie Lamport
Introduction
• Outline of the paper:– Motivation– Model– Algorithm– Correctness– Other issues
• Beyond…
Motivation
• Capture the global state of a system.
• Really?True global state:
Impossible!!!
p1
e11 e1
2 e13 e1
4 e15 e1
6
p2e2
1 e22 e2
3 e24 e2
5
p1
e11 e1
2 e13 e1
4 e15 e1
6
p2e2
1 e22 e2
3 e24 e2
5
Motivation
• Capture the global state of a system.
• Really?These are what
can be done
Are they useful?
p1
e11 e1
2 e13 e1
4 e15 e1
6
p2e2
1 e22 e2
3 e24 e2
5
Motivation
• Capture the global state of a system.
• Useful?
Equivalent!
p1
e11 e1
2 e13 e1
4 e15 e1
6
p2e2
1 e22 e2
3 e24 e2
5
p1
e11 e1
2 e13 e1
4 e15 e1
6
p2e2
1 e22 e2
3 e24 e2
5
Motivation
• Capture the global state of a system.
• Useful?
Consistent, but not happens in reality.
p1
e11 e1
2 e13 e1
4 e15 e1
6
p2e2
1 e22 e2
3 e24 e2
5
p1
e11 e1
2 e13 e1
4 e15 e1
6
p2e2
1 e22 e2
3 e24 e2
5
Motivation
• Capture the global state of a system.
• Useful?
Not even consistent!
p1
e11 e1
2 e13 e1
4 e15e1
6
p2e2
1 e22 e2
3 e24 e2
5
Motivation
• Capture the global state of a system.
• Useful? Yes:– To detect stable
properties of a system:y(S) y(S’)
for all S’ reachable from S.
– E.g.: • “computation has
terminated,” • “the system is
deadlocked,”• “all tokens in a token
ring have disappeared.”
Model
A distributed system
• A distributed system (on the right).
• A global state = set of processes’ and channels’ states.
• Event:– atomic– e = <p, S, S’, m,
c>• Computation:
– seq =(ei: 0 i n)
– Si+1 = next(Si, ei)
• Channels’ assumptions:– Singly directed– FIFO– Asynchronous– Error free– Infinite buffer
Algorithm
• Invoker: behave as if receiving a marker from a virtual node.• Receiving rule for process q receiving a marker along channel
c:if q has not recorded its state then
begin q records its state; q records the state c as the empty sequenceend
else q records the state of c as the sequence of messages received along c after q’s state was recorded and before q received the marker along c.
• Sending rule for a process p: for each outgoing channel c:p sends one marker along c after p records its
state and before p sends further messages along c.
Illustration
• Next 14 slides, courtesy of Professor Birman.
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
I want to start a
snapshot
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
p records local state
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
p starts monitoring incoming channels
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
“contents of channel p-y”
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
p floods message on outgoing channels…
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
q is done
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
q
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
q
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
q
zs
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
q
v
z
x
u
s
Chandy/Lamport
p
qr
s
t
u
v
w
xy
z
A network
q
v
w
z
x
u
s
y
r
Chandy/Lamport
pq
r
s
t
u
v
w
xy
z
A snapshot of a network
q
x
u
s
v
r
t
w
p
y
z
Done!
Correctness
• Consistency
• Termination
Consistency
• m is recorded iff so is send(m):– sender’s state recording and marker
sending are done atomically.• m is not recorded more than once:
– if channel is recorded before receiver, it will be empty.
– if channel is recorded after receiver, none of the in-channel messages will be recorded as the receiver’s state.
Correctness:
Termination
• Assumptions:– L1: no marker remains forever in a channel.– L2: processes’ states are recorded in finite time.
• Every process either spontaneously records its state, or there is a path from such a process.
• Every channel is flushed by a marker after the sender records its state.
Correctness:
Remained Issues
• Property of recorded state:Si --> S* --> Sf
• Stable detection:– Stable property:
• y(Si) definite• definite y(Sf)
– Algorithm:begin
record a global state S*;definite := y(S*)
end.
Beyond…
• Channels’ assumptions:–Singly directed–FIFO–Asynchronous–Error free–Infinite buffer
Non-FIFO
• What is FIFO for?–Separate messages between
before-snapshot and after-snapshot.
• A snapshot counter piggybacked on messages would do just fine!
Beyond:
Beyond…
• Channels’ assumptions:–Singly directed–FIFO–Asynchronous–Error free–Infinite buffer
Messages can be corrupted/duplicated
Messages can be dropped
Unreliable channels
• How to deal with corruption?– Checksum/ECC; reduced to drop.
• How to deal with duplication?– Message ID
• How to deal with dropping?– Channel states are not needed
anymore.– Markers indicate completion.
Beyond:
Even More Aggressive…
• Don’t want to piggyback!• Step 1: no piggybacking:
– Block all messages sent after recording local state and before receiving marker from all neighbors.
• Step 2: no blocking, min piggybacking– Blocked messages are sent with
piggybacked snapshot info.
Beyond:
Conclusion
• Two influential papers.• Much work built upon these results.• Can be improved significantly when
being adopted to particular systems.
• Additional comments/suggestions?