blazes: coordination analysis for distributed program peter alvaro, neil conway, joseph m....
DESCRIPTION
Asynchrony isn’t that hard Logical timestamps Deterministic interleaving Ameloriation:TRANSCRIPT
Blazes: coordination analysis for distributed
program
Peter Alvaro, Neil Conway, Joseph M. Hellerstein David MaierUC Berkeley
Portland State
Distributed systems are hard
Asynchrony Partial Failure
Asynchrony isn’t that hard
Logical timestampsDeterministic interleaving
Ameloriation:
Partial failure isn’t that hard
ReplicationReplay
Ameloriation:
Asynchrony * partial failure is hard2
Logical timestampsDeterministic interleaving
ReplicationReplay
asynchrony * partial failure is hard2
ReplicationReplay
Today:
Consistency criteria for fault-tolerant distributed systems
Blazes: analysis and enforcement
This talk is all setupFrame of mind:
1. Dataflow: a model of distributed computation2. Anomalies: what can go wrong?3. Remediation strategies
1. Component properties2. Delivery mechanisms
Framework:
Blazes – coordination analysis and synthesis
Little boxes: the dataflow model
Generalization of distributed services
Components interact via asynchronous calls (streams)
Components
Input interfaces Output interface
Streams
Nondeterministic order
Example: a join operator
R
ST
Example: a key/value store
put
getresponse
Example: a pub/sub service
publish
subscribedeliver
Logical dataflow
“Software architecture”
Data source
client
Service X filter cachec
a
b
Dataflow is compositional
Components are recursively defined
Data source
client
Service X filter aggregator
Dataflow exhibits self-similarity
Dataflow exhibits self-similarity
DB HDFS
Hadoop
Index Combine
StaticHTTPApp1
App2
Buy
Content
Userrequests
App1 answers
App2answers
Physical dataflow
Physical dataflow
Data source
client
Service X filter aggregatorc
a
b
Physical dataflow
Data source
Service X filter
aggregator
client“System architecture”
What could go wrong?
Cross-run nondeterminism
Data source
client
Service X filter aggregatorc
a
b
Run 1
Nondeterministic replays
Cross-run nondeterminism
Data source
client
Service X filter aggregatorc
a
b
Nondeterministic replays
Run 2
Cross-instance nondeterminism
Data source
Service X
client
Transient replica disagreement
Divergence
Data source
Service X
client
Permanent replica disagreement
Hazards
Data source
client
Service X filter aggregatorc
a
b
Order Contents?
Preventing the anomalies1. Understand component
semantics (And disallow certain compositions)
Component properties
• Convergence– Component replicas receiving the same
messages reach the same state– Rules out divergence
Insert Read
Convergentdata structure(e.g., Set CRDT)
Convergence
Insert Read
CommutativityAssociativityIdempotence
ReorderingBatchingRetry/duplication
Tolerant to
Convergence isn’t compositional
Data source
client
Convergent (identical input contents identical state)
Component properties
• Convergence– Component replicas receiving the same
messages reach the same state– Rules out divergence
• Confluence– Output streams have deterministic contents– Rules out all stream anomalies
Confluent convergent
Confluence
output set = f(input set)
{ }
{ }=
Confluence is compositional
output set = f g(input set)
Preventing the anomalies1. Understand component semantics
(And disallow certain compositions)2. Constrain message delivery
orders1. Ordering
Ordering – global coordination
Deterministicoutputs
Order-sensitive
Ordering – global coordination
Data source
client
The first principle of successful scalability is to batter the consistency mechanisms down to a minimum. – James Hamilton
Preventing the anomalies1. Understand component semantics
(And disallow certain compositions)2. Constrain message delivery
orders1. Ordering2. Barriers and sealing
Barriers – local coordination
Deterministicoutputs
Data source
clientOrder-sensitive
Barriers – local coordination
Data source
client
Sealing – continuous barriersDo partitions of (infinite) input streams “end”?
Can components produce deterministic results given “complete” input partitions?
Sealing: partition barriers for infinite streams
Sealing – continuous barriers
Finite partitions of infinite inputs are common …in distributed systems
– Sessions– Transactions– Epochs / views
…and applications– Auctions– Chats– Shopping carts
Blazes:
consistency analysis
+ coordination selection
Blazes:
Mode 1: Grey boxes
Grey boxes
Example: pub/sub
x = publishy = subscribez = deliver
x
yz
Deterministicbut unordered
Severity Label Confluent
Stateless
1 CR X X2 CW X3 ORgate X4 OWgate
x->z : CWy->z : CWT
Grey boxes
Example: key/value store
x = put; y = get; z = response
x
yz
Deterministicbut unordered
Severity Label Confluent
Stateless
1 CR X X2 CW X3 ORgate X4 OWgate
x->z : OWkeyy->z : ORT
Label propagation – confluent composition
CW CR
CR
CR
CRDeterministicoutputs
CW
Label propagation – unsafe composition
OW CR
CR
CR
CRTaintedoutputs
Interpositionpoint
Label propagation – sealing
OWkey CR
CR
CR
CRDeterministicoutputs
OWkeySeal(key=x)
Seal(key=x)
Blazes:
Mode 1: White boxes
white boxesmodule KVS state do interface input, :put, [:key, :val] interface input, :get, [:ident, :key] interface output, :response,
[:response_id, :key, :val] table :log, [:key, :val] end bloom do log <+ put log <- (put * log).rights(:key => :key) response <= (log * get).pairs(:key=>:key) do |s,l|
[l.ident, s.key, s.val] end
endend
put response: OWkey
get response: ORkey
Negation ( order sensitive)Partitioned by :key
white boxesmodule PubSub state do interface input, :publish, [:key, :val] interface input, :subscribe, [:ident, :key] interface output, :response,
[:response_id, :key, :val] table :log, [:key, :val] table :sub_log, [:ident, :key] end bloom do log <= publish
sub_log <= subscriberesponse <= (log * sub_log).pairs(:key=>:key) do |s,l|
[l.ident, s.key, s.val] end
endend
publish response: CWsubscribe response: CR
The Blazes frame of mind:
• Asynchronous dataflow model• Focus on consistency of data in
motion– Component semantics– Delivery mechanisms and costs
• Automatic, minimal coordination
Queries?