the virtue of dependent failures in multi-site systems flavio junqueira and keith marzullo...
TRANSCRIPT
The virtue of dependent failures in The virtue of dependent failures in multi-site systemsmulti-site systems
Flavio Junqueira and Keith Marzullo
University of California, San Diego
Workshop on Hot Topics in System Dependability (HotDep), Yokohama, Japan, June 2005
2HotDep’05
Multi-site systemsMulti-site systems
Collection of sites across a WAN Multiple processors per site
Storage nodes Computing nodes
Share resources E.g. BIRN, Geon, TeraGrid
Failures Processors unavailable Services do not mask failures
Improve availability under failures Replication Minimize overhead
3HotDep’05
IntroductionIntroduction
Failures in multi-site systems Processor failures Site failures
Processors of the site become unavailable
A new failure model
Availability through replication Replica placement Operations on replicas: quorums
Replicated data: quorum update Replicated functionality: state-machine
using Paxos
Quorum constructions
Failure model in practice Implement the model Site availability in BIRN Model for processor failures within a site
Misconfigured software Shared resources
1. Storage2. Power circuits3. Cooling pipes4. Air conditioning5. Network
Software and hardware faults
4HotDep’05
A dependent failure modelA dependent failure model
Threshold model Limit on the number of processor failures Simple Model well homogeneous processors that fail independently
Multi-site: sites unavailable frequently enough Processor failures are not IID All processors become unavailable
The multi-site threshold model Two components
Threshold on the number of site failures (fs) One threshold per site on processor failures (t)
Assumptions Sites are homogeneous Processors within a site are homogeneous Processor failure = crash
5HotDep’05
Quorum systemsQuorum systems
Quorum system Q Quorum system: set of quorums Quorum: set of processors Intersection property: every pair of quorums in Q intersect
Algorithms: access a quorum
Example: Majority system n processors Every subset of size (n+1)/2 is a quorum Optimal availability for IID processor failures
6HotDep’05
A quorum construction: A quorum construction: QSiteQSite
QSite Select at least (2 fs +1) sites: S
Select at least (2t +1) processors from each site in S
Quorum Majority of sites in S Majority of processors in each site
An example (fs = 1, t = 1)Site 1
Site 2
Site 3
Quorums
7HotDep’05
QSite vs. MajorityQSite vs. Majority
fst = 1 t = 2
Maj. QSite Maj. QSite
1 5 4 8 6
2 8 6 13 9
3 11 8 18 12
4 14 10 23 15
Properties of multi-site threshold model hold
Same replicas for QSite and Majority
Availability fs unavailable sites Remaining fs + 1 sites
t unavailable processors Majority: no quorum available
Requires:
Available: QSite: one quorum available
QSite has better availability Majority is not optimal
Quorum sizes QSite produces smaller
quorums Reduces load Increases capacity
€
2 fst + fs + t +1
€
fst + fs + t +1
8HotDep’05
Reducing quorum sizes and sitesReducing quorum sizes and sites
QSite, fs = 2, t = 1: 5 sites 3 processors per site 6 processors per quorum
Compromise availability Site 1
Site 2
Site 3
Site 4
Quorums
9HotDep’05
Site availabilitySite availability
Goals Show that sites are unavailable frequently enough Threshold on the number of site failures
BIRN - Biomedical Informatics Research Network Test bed projects centered around brain imaging Currently: 19 universities, 26 research groups
Availability Monthly basis Pings (BIRN-CC) Storage broker logs
Site availability Jan/04-Aug/04 Availability under 100%
On average in 5 out of the 8 months
€
Availability = Total hours - Unplanned outages
Total hours×100
10HotDep’05
BIRN site availabilityBIRN site availability
10 sites experience at least one outage
One site under 97%
11HotDep’05
Threshold on unavailable sitesThreshold on unavailable sites
Worst-case scenario Assumption: independent site failuresn most unavailable sites in each month Probability that all n sites are unavailable Each 1% of unavailability is approximately 7 hours
Number of sites (n) Unavailability in minutes
1 3288 (979)
2 87 (33)
3 1.9 (1.0)
4 0.017 (0.009)
12HotDep’05
Homogeneous set of processors Independent processor failures Identical probability of failure
Processors are repaired Repair probabilities change with number of failures
Markov chain
From the model: threshold on the number of failures (t) Desired degree of availability Stationary probabilities
Modeling failures in a siteModeling failures in a site
13HotDep’05€
limn → ∞
Pi0n = 0.96695
limn → ∞
Pi1n = 0.03223
limn → ∞
Pi2n = 0.00080
limn → ∞
Pi3n = 0.00002 Availability 0.001
t = 1
An exampleAn example
Three processors per site Probabilities
Failure probability much smaller than repair probabilities Repair probabilities increase with failures
14HotDep’05
Discussion & Future workDiscussion & Future work
Multi-site systems: important class of distributed systems Share resources Collaboration among distant groups
Improve availability through replication A useful abstraction: quorum systems Algorithms built on top of quorum systems
Dependent failures Site failures Enables smaller, higher available quorums
Lessons to learn Considering dependent failures may improve results Models are not necessarily complex
Future work Validate model, evaluate constructions in practice, more
constructions, etc.
15HotDep’05
END
16HotDep’05
EquationsEquations
€
limn → ∞
Pi0n = 0.96695
limn → ∞
Pi1n = 0.03223
limn → ∞
Pi2n = 0.00080
limn → ∞
Pi3n = 0.00002
€
p = 0.01
r0 = 0.3
r1 = 0.4
r2 = 0.5
€
Availability = Total hours - Unplanned outages
Total hours×100
€
2 fst + fs + t +1
€
fst + fs + t +1
17HotDep’05
IntroductionIntroduction
Failures in multi-site systems Processor failures Site failures
Processors of the site become unavailable A new failure model
Availability through replication Replica placement Operations on replicas: quorums Replicated data (quorum update) Replicated functionality (state-machine using
Paxos) Quorum constructions
Failure model in practice Implementability of the model Real system for site availability (BIRN) Model for processor failures within a site
1. Software incompatibility, misconfiguration
2. Shared resources (e.g. storage)3. Power failures4. Broken pipes5. Loss of air conditioning6. Network problems
Software and hardware faults
18HotDep’05
IntroductionIntroduction
Failures in multi-site systems Processor failures
E.g. HW failures Site failures
Strategies for replica placement Large number of sites and nodes
Updates Naïve approach: every non-faulty replica up to date Quorum update: contact a quorum of processors
Distributed shared register (replicated data) Multiple copies of a data set (Quorum Update) E.g. Brain images (BIRN); Geological data (Geon)
Consensus (replicated functionality) State-machine approach (Paxos algorithm) E.g.: Parallel computation (TeraGrid)
1. Software incompatibility, misconfiguration
2. Shared resources (e.g. storage)3. Power failures4. Broken pipes5. Loss of air conditioning6. Network problems
19HotDep’05
Why sites faWhy sites fa
1. Software incompatibility, misconfiguration2. Shared resources (e.g. storage)3. Power failures4. Broken pipes5. Loss of air conditioning6. Network problems
20HotDep’05
Quorums in a multi-site systemQuorums in a multi-site system
Data replication Multiple copies of data sets
Functionality replication State-machine approach Paxos (Coteries for Classic Paxos)
Question: How do we choose nodes to replicate? Flat organization Organization into sites
21HotDep’05
Quorum systemsQuorum systems
Quorum system Q Quorum system: set of quorums Quorum: set of processors Intersection property: every pair of quorums in Q intersect Algorithms: access a quorum when executing some operation
Examples Majority system:
n processors Every subset of size (n+1)/2 is a quorum Optimal availability for IID processor failures
Multi-colored: colors as sites
Processors
Quorums
22HotDep’05
Quorum systems (cont.)Quorum systems (cont.)
In multi-site systems Replicated data
Multiple copies of a data set (Quorum update) E.g. Brain images(BIRN); Geological data (Geon)
Replicated functionality State-machine approach (Paxos algorithm) E.g.: Parallel computation (TeraGrid)
Quorums for multi-site systems Replicating on every node is excessive Quorum construction
Set of processors to replicate on Quorums
23HotDep’05
Examples of quorum systemsExamples of quorum systems
Majority system: n processors Every subset of size (n+1)/2 is a quorum
Multi-colored: colors as sites
Majority has optimal availability for independent and identically distributed processor failures (IID)
Universe
Quorum patterns
24HotDep’05
BIRN site availabilityBIRN site availability
10 sites have at least one outage
One site under 97%
25HotDep’05
Discussion & Future workDiscussion & Future work
Multi-site systems: important class of distributed systems Share resources Collaboration among distant groups
Improve availability through replication A useful abstraction: quorum systems Algorithms built on top of quorum systems
Dependent failures Site failures Enables smaller, higher available quorums
Future work Validate multi-site threshold model Evaluate proposed constructions in practice More constructions More issues with dependent failures