blazes: coordination analysis for distributed program peter alvaro, neil conway, joseph m....

54
Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley

Upload: leslie-wells

Post on 06-Jan-2018

220 views

Category:

Documents


3 download

DESCRIPTION

Asynchrony isn’t that hard Logical timestamps Deterministic interleaving Ameloriation:

TRANSCRIPT

Page 1: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Blazes: coordination analysis for distributed

program

Peter Alvaro, Neil Conway, Joseph M. Hellerstein David MaierUC Berkeley

Portland State

Page 2: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Distributed systems are hard

Asynchrony Partial Failure

Page 3: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Asynchrony isn’t that hard

Logical timestampsDeterministic interleaving

Ameloriation:

Page 4: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Partial failure isn’t that hard

ReplicationReplay

Ameloriation:

Page 5: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Asynchrony * partial failure is hard2

Logical timestampsDeterministic interleaving

ReplicationReplay

Page 6: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

asynchrony * partial failure is hard2

ReplicationReplay

Today:

Consistency criteria for fault-tolerant distributed systems

Blazes: analysis and enforcement

Page 7: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

This talk is all setupFrame of mind:

1. Dataflow: a model of distributed computation2. Anomalies: what can go wrong?3. Remediation strategies

1. Component properties2. Delivery mechanisms

Framework:

Blazes – coordination analysis and synthesis

Page 8: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Little boxes: the dataflow model

Generalization of distributed services

Components interact via asynchronous calls (streams)

Page 9: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Components

Input interfaces Output interface

Page 10: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Streams

Nondeterministic order

Page 11: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Example: a join operator

R

ST

Page 12: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Example: a key/value store

put

getresponse

Page 13: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Example: a pub/sub service

publish

subscribedeliver

Page 14: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Logical dataflow

“Software architecture”

Data source

client

Service X filter cachec

a

b

Page 15: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Dataflow is compositional

Components are recursively defined

Data source

client

Service X filter aggregator

Page 16: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Dataflow exhibits self-similarity

Page 17: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Dataflow exhibits self-similarity

DB HDFS

Hadoop

Index Combine

StaticHTTPApp1

App2

Buy

Content

Userrequests

App1 answers

App2answers

Page 18: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Physical dataflow

Page 19: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Physical dataflow

Data source

client

Service X filter aggregatorc

a

b

Page 20: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Physical dataflow

Data source

Service X filter

aggregator

client“System architecture”

Page 21: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

What could go wrong?

Page 22: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Cross-run nondeterminism

Data source

client

Service X filter aggregatorc

a

b

Run 1

Nondeterministic replays

Page 23: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Cross-run nondeterminism

Data source

client

Service X filter aggregatorc

a

b

Nondeterministic replays

Run 2

Page 24: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Cross-instance nondeterminism

Data source

Service X

client

Transient replica disagreement

Page 25: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Divergence

Data source

Service X

client

Permanent replica disagreement

Page 26: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Hazards

Data source

client

Service X filter aggregatorc

a

b

Order Contents?

Page 27: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Preventing the anomalies1. Understand component

semantics (And disallow certain compositions)

Page 28: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Component properties

• Convergence– Component replicas receiving the same

messages reach the same state– Rules out divergence

Page 29: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Insert Read

Convergentdata structure(e.g., Set CRDT)

Convergence

Insert Read

CommutativityAssociativityIdempotence

ReorderingBatchingRetry/duplication

Tolerant to

Page 30: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Convergence isn’t compositional

Data source

client

Convergent (identical input contents identical state)

Page 31: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Component properties

• Convergence– Component replicas receiving the same

messages reach the same state– Rules out divergence

• Confluence– Output streams have deterministic contents– Rules out all stream anomalies

Confluent convergent

Page 32: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Confluence

output set = f(input set)

{ }

{ }=

Page 33: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Confluence is compositional

output set = f g(input set)

Page 34: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Preventing the anomalies1. Understand component semantics

(And disallow certain compositions)2. Constrain message delivery

orders1. Ordering

Page 35: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Ordering – global coordination

Deterministicoutputs

Order-sensitive

Page 36: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Ordering – global coordination

Data source

client

The first principle of successful scalability is to batter the consistency mechanisms down to a minimum. – James Hamilton

Page 37: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Preventing the anomalies1. Understand component semantics

(And disallow certain compositions)2. Constrain message delivery

orders1. Ordering2. Barriers and sealing

Page 38: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Barriers – local coordination

Deterministicoutputs

Data source

clientOrder-sensitive

Page 39: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Barriers – local coordination

Data source

client

Page 40: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Sealing – continuous barriersDo partitions of (infinite) input streams “end”?

Can components produce deterministic results given “complete” input partitions?

Sealing: partition barriers for infinite streams

Page 41: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Sealing – continuous barriers

Finite partitions of infinite inputs are common …in distributed systems

– Sessions– Transactions– Epochs / views

…and applications– Auctions– Chats– Shopping carts

Page 42: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Blazes:

consistency analysis

+ coordination selection

Page 43: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Blazes:

Mode 1: Grey boxes

Page 44: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Grey boxes

Example: pub/sub

x = publishy = subscribez = deliver

x

yz

Deterministicbut unordered

Severity Label Confluent

Stateless

1 CR X X2 CW X3 ORgate X4 OWgate

x->z : CWy->z : CWT

Page 45: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Grey boxes

Example: key/value store

x = put; y = get; z = response

x

yz

Deterministicbut unordered

Severity Label Confluent

Stateless

1 CR X X2 CW X3 ORgate X4 OWgate

x->z : OWkeyy->z : ORT

Page 46: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Label propagation – confluent composition

CW CR

CR

CR

CRDeterministicoutputs

CW

Page 47: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Label propagation – unsafe composition

OW CR

CR

CR

CRTaintedoutputs

Interpositionpoint

Page 48: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Label propagation – sealing

OWkey CR

CR

CR

CRDeterministicoutputs

OWkeySeal(key=x)

Seal(key=x)

Page 49: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Blazes:

Mode 1: White boxes

Page 50: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

white boxesmodule KVS state do interface input, :put, [:key, :val] interface input, :get, [:ident, :key] interface output, :response,

[:response_id, :key, :val] table :log, [:key, :val] end bloom do log <+ put log <- (put * log).rights(:key => :key) response <= (log * get).pairs(:key=>:key) do |s,l|

[l.ident, s.key, s.val] end

endend

put response: OWkey

get response: ORkey

Negation ( order sensitive)Partitioned by :key

Page 51: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

white boxesmodule PubSub state do interface input, :publish, [:key, :val] interface input, :subscribe, [:ident, :key] interface output, :response,

[:response_id, :key, :val] table :log, [:key, :val] table :sub_log, [:ident, :key] end bloom do log <= publish

sub_log <= subscriberesponse <= (log * sub_log).pairs(:key=>:key) do |s,l|

[l.ident, s.key, s.val] end

endend

publish response: CWsubscribe response: CR

Page 52: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

The Blazes frame of mind:

• Asynchronous dataflow model• Focus on consistency of data in

motion– Component semantics– Delivery mechanisms and costs

• Automatic, minimal coordination

Page 53: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Queries?

Page 54: Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State