the horus and ensemble projects: accomplishments and limitations
Post on 12-Jan-2016
31 Views
Preview:
DESCRIPTION
TRANSCRIPT
The Horus and Ensemble Projects: Accomplishments and Limitations
Ken Birman, Robert Constable, Mark Hayden, Jason Hickey, Christoph
Kreitz,Robbert van Renesse, Ohul Rodeh, Werner
Vogels
Department of Computer ScienceCornell University
January, 2000 Cornell Presentation at DISCEX
Reliable Distributed Computing: Increasingly urgent, still unsolved
Distributed computing has swept the world Impact has become revolutionary Vast wave of applications migrating to networks Already as critical a national infrastructure as
water, electricity, or telephones Yet distributed systems remain
Unreliable, prone to inexplicable outages Insecure, easily attacked Difficult (and costly) to program, bug-prone
January, 2000 Cornell Presentation at DISCEX
A National Imperative
Potential for catastrophe cited by DARPA ISAT study commissioned by Anita Jones
(1985, I briefed the findings, became basis for refocusing of much of ITO under Howard Frank)
PCCIP report, PTAC NAS study of trust in cyberspace
Need a quantum improvement in technologies, packaged in easily used, practical forms
January, 2000 Cornell Presentation at DISCEX
Quick Timeline
Cornell has developed 3 generations of reliable group communication technology Isis Toolkit: 1987-1990 Horus System: 1990-1994 Ensemble System: 1994-1999
Today engaged in a new effort reflecting a major shift in emphasis Spinglass Project: 1999-
January, 2000 Cornell Presentation at DISCEX
Questions to consider
Have these projects been successful? What did we do? How can impact be quantified? What limitations did we encounter?
How is industry responding? What next?
January, 2000 Cornell Presentation at DISCEX
Timeline
Isis Horus Ensemble
• Introduced reliability into group computing• Virtual synchrony execution model• Elaborate, monolithic, but adequate speed• Many transition successes
• New York, Swiss Stock Exchanges• French Air Traffic Control console system• Southwestern Bell Telephone network mgt.• Hiper-D (next generation AEGIS)
January, 2000 Cornell Presentation at DISCEX
Virtual Synchrony Model
crash
G0={p,q} G1={p,q,r,s} G2={q,r,s} G3={q,r,s,t}
p
q
r
s
tr, s request to join
r,s added; state xfer
t added, state xfer
t requests to join
p fails
January, 2000 Cornell Presentation at DISCEX
Why a “model”?
Models can be reduced to theory – we can prove the properties of the model, and can decide if a protocol achieves it
Enables rigorous application-level reasoning
Otherwise, the application must guess at possible misbehaviors and somehow overcome them
January, 2000 Cornell Presentation at DISCEX
Virtual Synchrony
Became widely accepted – basis of literally dozens of research systems and products worldwide
Seems to be the only way to solve problems based on replication
Very fast in small systems but faces scaling limitations in large ones
January, 2000 Cornell Presentation at DISCEX
How Do We Use The Model?
Makes it easy to reason about the state of a distributed computation
Allows us to replicate data or computation for fault-tolerance (or because multiple users share same data)
Can also replicate security keys, do load-balancing, synchronization…
January, 2000 Cornell Presentation at DISCEX
French ATC system (simplified)
Controllers
Air Traffic Database (flight plans, etc)
X.500 Directory
Radar
Onboard
January, 2000 Cornell Presentation at DISCEX
A center contains...
Perhaps 50 “teams” of 3-5 controllers each Each team supported by workstation cluster Cluster-style database server has flight plan
information Radar server distributes real-time updates Connections to other control centers (40 or so
in all of Europe, for example)
January, 2000 Cornell Presentation at DISCEX
Process groups arise here:
Cluster of servers running critical database server programs
Cluster of controller workstations support ATC by teams of controllers
Radar must send updates to the relevant group of control consoles
Flight plan updates must be distributed to the “downstream” control centers
January, 2000 Cornell Presentation at DISCEX
Role For Virtual Synchrony?
French government knows requirements for safety in ATC application
With our model, we can reduce their need to a formal set of statements
This lets us establish that our solution will really be safe in their setting
Contrast with usual ad-hoc methodologies...
January, 2000 Cornell Presentation at DISCEX
More Isis Users
New York Stock Exchange Swiss Stock Exchange Many VLSI Fabrication Facilities Many telephony control applications Hiper-D – an AEGIS rebuild prototype Various NSA and military applications Architecture contributed to SC-21/DD-
21
January, 2000 Cornell Presentation at DISCEX
Timeline
Isis Horus Ensemble
• Simpler, faster group communication system• Uses a modular layered architecture. Layers are “compiled,” headers compressed for speed• Supports dynamic adaptation and real-time apps• Partitionable version of virtual synchrony• Transitioned primarily through Stratus Computer
Phoenix product, for telecommunications
January, 2000 Cornell Presentation at DISCEX
Layered Microprotocols in Horus
Interface to Horus is extremely flexible
Horus manages group abstraction
group semantics (membership, actions,events) defined by stack of modules
encryptencryptfilterfiltersignsign
ftolftolEnsemble stacksplug-and-playmodules to givedesign flexibilityto developer
vsyncvsync
January, 2000 Cornell Presentation at DISCEX
Layered Microprotocols in Horus
Interface to Horus is extremely flexible
Horus manages group abstraction
group semantics (membership, actions,events) defined by stack of modules
encryptencryptfilterfiltersignsign
ftolftolEnsemble stacksplug-and-playmodules to givedesign flexibilityto developer
vsyncvsync
January, 2000 Cornell Presentation at DISCEX
Layered Microprotocols in Horus
Interface to Horus is extremely flexible
Horus manages group abstraction
group semantics (membership, actions,events) defined by stack of modules
encryptencryptfilterfiltersignsign
Ensemble stacksplug-and-playmodules to givedesign flexibilityto developer
vsyncvsync
ftolftol
January, 2000 Cornell Presentation at DISCEX
Group Members Use Identical Multicast Protocol Stacks
encryptencrypt
vsyncvsync
ftolftol
encryptencrypt
vsyncvsync
ftolftol
encryptencrypt
vsyncvsync
ftolftol
January, 2000 Cornell Presentation at DISCEX
encryptencrypt
vsyncvsync
ftolftol
encryptencrypt
vsyncvsync
ftolftol
encryptencrypt
vsyncvsync
ftolftol
With Multiple Stacks, Multiple Properties
encryptencrypt
vsyncvsync
ftolftol
encryptencrypt
vsyncvsync
ftolftol
encryptencrypt
vsyncvsync
ftolftol
January, 2000 Cornell Presentation at DISCEX
Timeline
Isis Horus Ensemble
• Horus-like stacking architecture, equally fast• Includes an group-key mechanism for secure group multicast and key management• Uses high level language, can be formally proved, an unexpected and major success• Many early transition successes
DD-21, Quorum via collaboration with BBN Nortel, STC: commercial users Discussions with MS (COM+): could be basis of standards.
January, 2000 Cornell Presentation at DISCEX
Proving Ensemble Correct
Unlike Isis and Horus, Ensemble is coded in a language with strong semantics (ML)
So we took a spec. of virtual synchrony from MIT’s IOA group (Nancy Lynch)
And are actually able to prove that our code implements the spec. and that the spec captures the virtual synchrony property!
January, 2000 Cornell Presentation at DISCEX
Why is this important?
If we use Ensemble to secure keys, our proof is a proof of security of the group keying protocol…
And the proof extends not just to the algorithm but also to the actual code implementing it
These are the largest formal proofs every undertaken!
January, 2000 Cornell Presentation at DISCEX
Why is this feasible?
Power of the NuPRL system: a fifth generation theorem proving technology
Simplifications gained through modularity: compositional code inspires a style of compositional proof
Ensemble itself is unusually elegant, protocols are spare and clear
January, 2000 Cornell Presentation at DISCEX
Other Accomplishments
An automated optimization technology Often, a simple modular protocol becomes
complex when optimized for high performance
Our approach automates optimization: the basic protocol is only coded once and we work with a single, simple, clear version
Optimizer works almost as well as hand-optimization and can be invoked at runtime
January, 2000 Cornell Presentation at DISCEX
Optimization
encryptencrypt
vsyncvsync
ftolftol
Original code is simple but inefficient
Optimized is provably the same yet inefficiencies are eliminated
January, 2000 Cornell Presentation at DISCEX
Other Accomplishments
Real-Time Fault-Tolerant Clusters Problem originated in AEGIS tracking
server Need a scalable, fault-tolerant parallel
server with rapid real-time guarantees
January, 2000 Cornell Presentation at DISCEX
AEGIS Problem
Emulate this… With this…
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
TrackingServer
100ms deadline
January, 2000 Cornell Presentation at DISCEX
Other Accomplishments
Real-Time Fault-Tolerant Clusters Problem originated in AEGIS tracking server Need a scalable, fault-tolerant parallel server
with rapid real-time guarantees With Horus, we achieved 100ms response time,
even when nodes crash, scalability to 64 nodes or more, load balancing and linear speedups
Our approach emerged as one of the major themes in SC-21, which became DD-21
January, 2000 Cornell Presentation at DISCEX
Other Accomplishments
A flexible, object-oriented toolkit Standardizes the sorts of things
programmers do most often Programmers are able to work with high
level abstractions rather than being forced to reimplement common tools, like replicated data, each time they are needed
Embedding into NT COM architecture
January, 2000 Cornell Presentation at DISCEX
Security Architecture
Group key management Fault-tolerant, partitionable Currently exploring a very large scale
configuration that would permit rapid key refresh and revocation even with millions of users
All provably correct
January, 2000 Cornell Presentation at DISCEX
Transition Paths?
Through collaboration with BBN, delivered to DARPA QUOIN effort
Part of DD-21 architecture Strong interest in industry, good
prospects for “major vendor” offerings within a year or two
A good success for Cornell and DARPA
January, 2000 Cornell Presentation at DISCEX
What Next?
Continue some work with Ensemble Research focuses on proof of replication stack Otherwise, keep it alive, support and extend it Play an active role in transition Assist standards efforts
But shift in focus to a completely new effort Emphasize adaptive behavior, extreme
scalability, robustness against local disruption Fits “Intrinisically Survivable Systems” initiative
January, 2000 Cornell Presentation at DISCEX
Throughput Stability: Achilles Heel of Group Multicast
When scaled to even modest environments, overheads of virtual synchrony become a problem One serious challenge involves management of
group membership information But multicast throughput also becomes unstable
with high data rates, large system size, too. A problem in every protocol we’ve studied
including other “scalable, reliable” protocols
January, 2000 Cornell Presentation at DISCEX
Thoughput Scenario
Most members are healthy….
… but one is slow
January, 2000 Cornell Presentation at DISCEX
Throughput as one member of a multicast group is "perturbed" by forcing it to sleep for varying amounts of time.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
50
100
150
200
250Virtually synchronous Ensemble multicast protocols
Degree of slowdown
Gro
up
th
rou
gh
pu
t (h
ealt
hy m
em
bers
)
32 group members
64 group members
96 group members
January, 2000 Cornell Presentation at DISCEX
Bimodal Multicast in Spinglass
A new family of protocols with stable throughput, extremely scalable, fixed and low overhead per process and per message
Gives tunable probabilistic guarantees Includes a membership protocol and a
multicast protocol Ideal match for small, nomadic devices
January, 2000 Cornell Presentation at DISCEX
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
20
40
60
80
100
120
140
160
180
200
slowdown
aver
age
thro
ugh
pu
t
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
20
40
60
80
100
120
140
160
180
200
slowdown
aver
age
thro
ugh
pu
t
Throughput with 25% Slow processes
January, 2000 Cornell Presentation at DISCEX
Spinglass: Summary of objectives
Radically different approach yields stable, scalable protocols with steady throughput
Small footprint, tunable to match conditions Completely asynchronous, hence demands
new style of application development But opens the door to a new lightweight
reliability technology supporting large autonomous environments that adapt
January, 2000 Cornell Presentation at DISCEX
Conclusions
Cornell: leader in reliable distributed computing High impact on important DoD problems, such as
AEGIS (DD-21), QUOIN, NSA intelligence gathering, many other applications
Demonstrated modular plug-and-play protocols that perform well and can be proved correct
Transition into standard, off the shelf O/S Spinglass – the next major step forward
top related