virtual synchrony

54
Virtual Synchrony Justin W. Hart CS 614 11/17/2005

Upload: daphne-mcdonald

Post on 31-Dec-2015

51 views

Category:

Documents


0 download

DESCRIPTION

Virtual Synchrony. Justin W. Hart CS 614 11/17/2005. Papers. The Process Group Approach to Reliable Distributed Computing . Birman. CACM, Dec 1993, 36(12):37-53. Understanding the Limitations of Causally and Totally Ordered Communication .  Cheriton and Skeen.  14th SOSP, 1993. Background. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Virtual Synchrony

Virtual Synchrony

Justin W. HartCS 614

11/17/2005

Page 2: Virtual Synchrony

Papers The Process Group Approach to Rel

iable Distributed Computing. Birman. CACM, Dec 1993, 36(12):37-53.

Understanding the Limitations of Causally and Totally Ordered Communication.  Cheriton and Skeen.  14th SOSP, 1993.

Page 3: Virtual Synchrony

Background Chandy-Lamport Logical Clocks Consistent Cuts Distributed Snapshots Publish/Subscribe Fail-Stop

Page 4: Virtual Synchrony

Fail Stop Group Membership Service Processes appear to fail by halting How does this affect the FLP

result?

Page 5: Virtual Synchrony

Motivation Information Backplane Customization Hierarchical Structure Fault-Tolerance Reliability

Page 6: Virtual Synchrony

Process GroupsTypes of groups Anonymous groups Explicit groups

Implementation Requirements

Group communication Group membership as

input Synchronization

Page 7: Virtual Synchrony

Anonymous Groups Group addressing Messages sent exactly once to all

or no recipients Ordering Logging

Page 8: Virtual Synchrony

Explicit Groups Group members cooperate directly

May execute algorithms based on membership knowledge

Communication is sensitive to membership changes

Page 9: Virtual Synchrony

Building groups over conventional technology Conventional message passing

technologies Group addressing Logical time & causal dependency Message delivery ordering State transfer Fault tolerance

Page 10: Virtual Synchrony

Close Synchrony Close Synchrony

100% lock-step execution model

Page 11: Virtual Synchrony

A synchronous execution

p

q

r

s

t

u

With true synchrony executions run in genuine lock-step.

Page 12: Virtual Synchrony

So… what’s wrong with that?

Under close synchrony, execution is limited by the slowest process in the group!

Page 13: Virtual Synchrony

Virtual Synchrony Relax synchronization

requirements where possible Benefit by allowing for

asynchronous interactions Do this where the result is identical

to close synchrony

Page 14: Virtual Synchrony

A few protocols… fbcast cbcast abcast gbcast

Page 15: Virtual Synchrony

Four protocols!?!? …but Justin. The paper only

discussed 2 protocols… you’re getting off-topic!

Page 16: Virtual Synchrony

A few protocols… fbcast

Simple protocol upon which we’ll build the others.

Delivery is FIFO ordered, with respect to the original sender

Accomplished easily with a logical timestamp cbcast abcast gbcast

Page 17: Virtual Synchrony

Single updater If p is the only update source, the

need is a bit like the TCP “fifo” ordering

fbcast is a good choice for this case

p

rst

1 2 3 4

Page 18: Virtual Synchrony

A few protocols… fbcast cbcast

Receipt is causally ordered Protocol in paper uses token passing Another simple protocol uses vector

timestamps abcast gbcast

Page 19: Virtual Synchrony

Causally ordered updates Simple protocol

based on token passing

Page 20: Virtual Synchrony

Causally ordered updates Example: messages from p and s

arrive out of order at t

p

rst

VT(a) = [0,0,0,1]

VT(b)=[1,0,0,1]

VT(c) = [1,0,1,1]

c is early: VT(c) = [1,0,1,1] but VT(t)=[0,0,0,1]: clearly we are missing one message from sWhen b arrives, we can deliver

both it and message c, in order

Page 21: Virtual Synchrony

Causally ordered updates Each thread corresponds to a different

lock

In effect: red “events” never conflict with green ones!

p

r

s

t1

2

3

4

5

1

2

Page 22: Virtual Synchrony

Hey… that sped things up! Now I get it!

Processes only have to wait for processes that they depend on. Not the slowest in the group!

Page 23: Virtual Synchrony

A few protocols… fbcast cbcast abcast

Atomic delivery ordering With respect to other abcasts

More costly than cbcast, but with a stronger ordering property

ISIS builds abcast over cbcast gbcast

Page 24: Virtual Synchrony

A few protocols… fbcast cbcast abcast gbcast

Atomic delivery ordering With respect to everything

Page 25: Virtual Synchrony

Three Round Multicast

Page 26: Virtual Synchrony

As a time-line picture

2PC initiator

pqrst

Vote?

All vote “commit”

Commit!

Phase 1 Phase 2

Page 27: Virtual Synchrony

Just one more…

Page 28: Virtual Synchrony

Flush protocol We say that a message is unstable

if some receiver has it but (perhaps) others don’t For example, q’s message is unstable

at process r If q fails we want to “flush”

unstable messages out of the system

Page 29: Virtual Synchrony

Styles of groups Peer Groups

Processes cooperate closely Client-Server Groups

Group acts as a server Client multicasts repeatedly to the group

Diffusion Groups Group serves information Clients connect to receive data from group

Hierarchical Groups Offer scalability through a hierarchy of

connected groups

Page 30: Virtual Synchrony

Historical Aside Two major classes of real systems

Virtual synchrony Weaker properties – not quite “FLP consensus” Much higher performance (orders of magnitude) Requires that majority of system remain

connected. Partitioning failures force protocols to wait for repair

Quorum-based state machine protocols are Closer to FLP definition of consensus Slower (by orders of magnitude) Sometimes can make progress in partitioning

situations where virtual synchrony can’t

Page 31: Virtual Synchrony

Names of some famous systems Isis was first practical virtual synchrony

system Later followed by Transis, Totem, Horus Today: Best options are Jgroups, Spread, Ensemble Technology is now used in IBM Websphere and

Microsoft Windows Clusters products! Paxos was first major state machine system

BASE and other Byzantine Quorum systems now getting attention from the security community

(End of Historical aside)

Page 32: Virtual Synchrony

Sounds good… what’s wrong with it? Tries to solve state problems at

communication level This violates the end-to-end

argument! Consistency requirements are

typically stated with respect to application state

Page 33: Virtual Synchrony

Stable vs Durable Stable – messages are buffered

until received by all group members

Durable – message will be delivered, even if the sender dies

Page 34: Virtual Synchrony

Ordering semantics Incidental Ordering Semantic Ordering Prescriptive Ordering

Page 35: Virtual Synchrony

The problem with CATOCS It can’t say “for sure” It can’t say the “whole story” It can’t say “together” It can’t say it efficiently

Page 36: Virtual Synchrony

It can’t say “for sure” Processes

communicating over a “hidden” channel Common database Shared memory

Two threads reacting to external event

Page 37: Virtual Synchrony

It can’t say “together”

Standard solution – locking Transaction models allow for abort

and rollback Higher level conditions… what

happens if a message arrives, but is not successfully processed

Page 38: Virtual Synchrony

Stock trading example

Page 39: Virtual Synchrony

Can’t say the “whole story” Not everything can be expressed

through the “happens-before” relationship

Semantic ordering constraints Causal memory, the weakest of these,

cannot be expressed in causal multicast Total ordering helps some of these, but

is far too expensive Inexpensive, state-level protocols with

logical clocks can solve these

Page 40: Virtual Synchrony

It can’t say it efficiently False causality

Potential causality != Actual causality Memory requirements for buffering

“unstable” messages Ordering information during

transmission and reception

Page 41: Virtual Synchrony

And… what of the end to end argument? All of this considers our

communication channels… isn’t the application-level check far more important?

Page 42: Virtual Synchrony

Classes of distributed applications Data dissemination

Netnews Trading application example

Global predicate evaluation Transactional applications Replicated data Replication in the large Distributed real-time applications

Page 43: Virtual Synchrony

Implementing only part of the messaging? Can you cut down on overhead by

implementing only part of the messaging using CATOCS?

Page 44: Virtual Synchrony

Semantics Are the semantics of state-based

approaches superior to those of virtual synchrony?

Page 45: Virtual Synchrony

Scalability N Processes Time T to propagate a message

across the system Grows roughly proportional with the

square root of the number of processes

Arcs in the active causal graph grow quadratically

Quadratic causal graph

Page 46: Virtual Synchrony

Buffering grows Quadratic arcs Linear communication of causal

dependencies Linear growth in required buffering

Changing topologies doesn’t help CATOCS would require separate process

groups for read and write to accomplish optimization of updates vs queries

Page 47: Virtual Synchrony

Group membership protocols Must enforce atomic delivery

semantics Run our most expensive protocol…

gbcast Failures increase with the size of

the system, increasing load on the GMS

Page 48: Virtual Synchrony

Who uses ISIS? Brokerage Database replication and triggers

Page 49: Virtual Synchrony

ISIS-based utilities NEWS

A pub/sub application with that will replay histories

NMGR Manages batch-style jobs and

performs load sharing Parallel make

Page 50: Virtual Synchrony

ISIS-based utilities DECEIT

NFS compatible file system META/LOMITA

Sensors & actuators Abstract sensors Specify control actions in high-level

terms SPOOLER/LONG-HAUL FACILITY

Page 51: Virtual Synchrony

Now… somewhat supported

ISIS/Horus/Ensemble/QuickSilver JGroups Spread Totem Transis WebSphere & Windows Cluster

(internally)

Page 52: Virtual Synchrony

…and people actually use it. NYSE French ATC System AEGIS

Page 53: Virtual Synchrony

An ongoing debate The effort continues here at

Cornell with the QuickSilver effort

You’ve been presented the options… what are your conclusions?

Page 54: Virtual Synchrony

References Some slides borrowed from Ken Birman’s CS 614 slide sets on

Virtual Synchrony http://www.cs.cornell.edu/courses/cs514/2005sp/Slide%20Sets.htm

Images have been borrowed from The Process Group Approach to Reliable Distributed Computing. Birman. CACM, Dec 1993, 36(12):37-53.

Images have been borrowed from Understanding the Limitations of Causally and Totally Ordered Communication.  Cheriton and Skeen.  14th SOSP, 1993.

Statements and ideas have been borrowed verbatim from both papers, including section headings, and statements in notes. This has been mostly for coherence between the slides and papers

Also sourced data from http://www.cs.cornell.edu/ken/