consistent cuts

37
Consistent Cuts Ken Birman

Upload: jerry-houston

Post on 31-Dec-2015

69 views

Category:

Documents


1 download

DESCRIPTION

Consistent Cuts. Ken Birman. Idea. We would like to take a snapshot of the state of a distributed computation We’ll do this by asking participants to jot down their states Under what conditions can the resulting “puzzle pieces” be assembled into a consistent whole?. An instant in real-time. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Consistent Cuts

Consistent Cuts

Ken Birman

Page 2: Consistent Cuts

Idea

We would like to take a snapshot of the state of a distributed computation

We’ll do this by asking participants to jot down their states

Under what conditions can the resulting “puzzle pieces” be assembled into a consistent whole?

Page 3: Consistent Cuts

An instant in real-time

Imagine that we could photograph the system in real-time at some instant

Process state: A set of variables and values

Channel state Messages in transit through the network

In principle, the system is fully defined by the set of such states

Page 4: Consistent Cuts

Problems?

Real systems don’t have real-time snapshot facilities

In fact, real systems may not have channels in this sense, either

How can we approximate the real-time concept of a cut using purely “logical time”?

Page 5: Consistent Cuts

Deadlock detection

Need to detect cycles

A B

D C

Page 6: Consistent Cuts

Deadlock is a “stable” property

Once a deadlock occurs, it holds in all future states of the system

Easy to prove that if a snapshot is computed correctly, a stable condition detected in the snapshot will continue to hold Insight is that adding events can’t “undo” the

condition

Page 7: Consistent Cuts

Leads us to define consistent cut and snapshot

Think of the execution of a process as a history of events, Lamport-style Events can be local, msg-send, msg-rcv

A consistent snapshot is a set of history prefixes and messages closed under causality

A consistent cut is the frontier of a consistent snapshot – the “process states”

Page 8: Consistent Cuts

Deadlock detection

Need to detect cycles

A B

D C

Page 9: Consistent Cuts

Deadlock detection

Need to detect cycles

A B

D C

Page 10: Consistent Cuts

Deadlock detection

Need to detect cycles

A B

D C

Page 11: Consistent Cuts

Deadlock detection

A “ghost” or “false” cycle!

A B

D C

Page 12: Consistent Cuts

A ghost deadlock

Occurs when we accidently snapshot process states so as to include some events while omitting prior events

Can’t occur if the cut is computed consistently since this violates causal closure requirement

Page 13: Consistent Cuts

A ghost deadlock

A

B

C

D

Page 14: Consistent Cuts

A ghost deadlock

A

B

C

D

Page 15: Consistent Cuts

A ghost deadlock

A

B

C

D

Page 16: Consistent Cuts

Algorithms for computing consistent cuts

Paper focuses on a flooding algorithm We’ll consider several other methods too

Logical timestamps Flooding algorithm without blocking Two-phase commit with blocking

Each pattern arises commonly in distributed systems we’ll look at in coming weeks

Page 17: Consistent Cuts

Cuts using logical clocks

Suppose that we have Lamport’s basic logical clocks

But we add a new operation called “snap” Write down your process state Create empty “channel state” structure Set your logical clock to some big value

Think of clock as (epoch-number, counter)? Record channel state until rcv message with big

incoming clock value

Page 18: Consistent Cuts

How does this work?

Recall that with Lamport’s clocks, if e is causally prior to e’ then LT(e) < LT(e’)

Our scheme creates a snapshot for each process at instant it reaches logical time t

Easy to see that these events are concurrent: a possible instant in real-time

Depends upon FIFO channels, can’t easily tell when cut is complete – a sort of lazy version of the flooding algorithm

Page 19: Consistent Cuts

Flooding algorithm

To make a cut, observer sends out messages “snap” On receiving “snap” the first time, A

Writes down its state, creates empty channel state record for all incoming channels

Sends “snap” to all neighbor processes Waits for “snap” on all incoming channels

A’s piece of the snapshot is its state and the channel contents once it receives “snap” from all neighbors

Note: also assumes FIFO channels

Page 20: Consistent Cuts

With 2-phase commit

In this, the initiator sends to all neighbors: “Please halt” A halts computation, sends “please halt” to all downstream

neighbors Waits for “halted” from all of them Replies “halted” to upstream caller

Now initiator sends “snap” A forwards “snap” downstream Waits for replies Collects them into its own state Send’s own state to upstream caller and resumes

Page 21: Consistent Cuts

Why does this work?

Forces the system into an idle state In this situation, nothing is changing… Usually, sender in this scheme records

unacknowledged outgoing channel state Alternative: upstream process tells receiver how

many incoming messages to await, receiver does so and includes them in its state.

So a snapshot can be safely computed and there is nothing unaccounted for in the channels

Page 22: Consistent Cuts

Observation

Suppose we use a two-phase property detection algorithm

In first phase, asks (for example), “what is your current state”

You reply “waiting for a reply from B” and give a “wait counter”

If a second round of the same algorithm detects the same condition with the same wait-counter values, the condition is stable…

Page 23: Consistent Cuts

A ghost deadlock

A

B

C

D

Page 24: Consistent Cuts

Look twice and it goes away…

But we could see “new” wait edges mimicking the old ones

This is why we need some form of counter to distinguish same-old condition from new edges on the same channels…

Easily extended to other conditions

Page 25: Consistent Cuts

Consistent cuts

Offer the illusion that you took a picture of the system at an instant in real-time

A powerful notion widely used in real systems Especially valuable after a failure Allows us to reconstruct the state so that we

can repair it, e.g. recreate missing tokens But has awkward hidden assumptions

Page 26: Consistent Cuts

Hidden assumptions

Use of FIFO channels is a problem Many systems use some form of datagram Many systems have multiple concurrent

senders on same paths These algorithms assume knowledge of

system membership Hard to make them fault-tolerant Recall that a slow process can seem “faulty”

Page 27: Consistent Cuts

High costs

With flooding algorithm, n2 messages With 2-phase commit algorithm, system pauses for

a long time We’ll see some tricky ways to hide these costs

either by continuing to run but somehow delaying delivery of messages to the application, or by treating the cut algorithm as a background task

Could have concurrent activities that view same messages in different ways…

Page 28: Consistent Cuts

Fault-tolerance

Many issues here Who should run the algorithm? If we decide that a process is faulty, what happens if a

message from it then turns up? What if failures leave a “hole” in the system state –

missing messages or missing process state

Problems are overcome in virtual synchrony implementations of group communication tools

Page 29: Consistent Cuts

Systems issues

Suppose that I want to add notions such as real-time, logical time, consistent cuts, etc to a complex real-world operating system (list goes on)

How should these abstractions be integrated with the usual O/S interfaces, like the file system, the process subsystem, etc?

Only virtual synchrony has really tackled these kinds of questions, but one could imagine much better solutions. A possible research topic, for a PhD in software engineering

Page 30: Consistent Cuts

Theory issues

Lamport’s ideas are fundamentally rooted in static notions of system membership

Later with his work on Paxos he adds the idea of dynamically changing subsets of a static maximum set

Does true dynamicism, of sort used when we look at virtual synchrony, have fundamental implications?

Page 31: Consistent Cuts

Example of a theory question

Suppose that I want to add a “location” type to a language like Java: Object o is at process p at computer x Objects {a,b,c} are replicas of

Now notions of system membership and location are very fundamental to the type system

Need a logic of locations. How should it look? Extend to a logic of replication and self-defined

membership? But FLP lurks in the shadows…

Page 32: Consistent Cuts

FLP

Page 33: Consistent Cuts

Other questions

Checkpoint/rollback Processes make checkpoints, probably when

convenientSome systems try to tell a process “when” to make

them, using some form of signal or interruptBut this tends to result in awkward, large

checkpoints Later if a fault occurs we can restart from the

most recent checkpoint

Page 34: Consistent Cuts

So, where’s the question?

The issue arises when systems use message passing and want to checkpoint/restart

Few applications are deterministic Clocks, signals, threads & scheduling, interrupts,

multiple I/O channels, order in which messages arrived, user input…

When rolling forward from a checkpoint actions might not be identical Hence anyone who “saw” my actions may be in a

state that won’t be recreated!

Page 35: Consistent Cuts

Technical question

Suppose we make checkpoints in an uncoordinated manner

Now process p fails Which other processes should roll back? And how far might this rollback cascade?

Page 36: Consistent Cuts

Rollback scenario

Page 37: Consistent Cuts

Avoiding cascaded rollback?

Both making checkpoints, and rolling back, should happen along consistent cuts

In mid 1980’s several papers developed this into simple 2-phase protocols

Today would recognize them as algorithms that simply run on consistent cuts

For those who are interested: sender-based logging is the best algorithm in this area. (Alvisi’s work)