coding for atomic shared memory emulation

55
Coding for Atomic Shared Memory Emulation Viveck R. Cadambe (MIT) Joint with Prof. Nancy Lynch (MIT), Prof. Muriel Médard (MIT) and Dr. Peter Musial (EMC)

Upload: sade-jordan

Post on 30-Dec-2015

41 views

Category:

Documents


1 download

DESCRIPTION

Coding for Atomic Shared Memory Emulation. Viveck R. Cadambe (MIT) Joint with Prof. Nancy Lynch (MIT), Prof. Muriel Médard (MIT) and Dr. Peter Musial (EMC). Erasure Coding for Distributed Storage. Erasure Coding for Distributed Storage. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Coding for Atomic Shared Memory Emulation

Coding for Atomic Shared Memory Emulation

Viveck R. Cadambe (MIT)

Joint with Prof. Nancy Lynch (MIT), Prof. Muriel Médard (MIT) and Dr. Peter Musial

(EMC)

Page 2: Coding for Atomic Shared Memory Emulation

Erasure Coding for Distributed Storage

Page 3: Coding for Atomic Shared Memory Emulation

• Locality, Repair Bandwidth, Caching and Content Distribution– [Gopalan et. al 2011, Dimakis-Godfrey-Wu-Wainwright- 10, Wu-

Dimakis 09, Niesen-Ali 12]

Erasure Coding for Distributed Storage

Page 4: Coding for Atomic Shared Memory Emulation

• Locality, Repair Bandwidth, Caching and Content Distribution– [Gopalan et. al 2011, Dimakis-Godfrey-Wu-Wainwright- 10, Wu-

Dimakis 09, Niesen-Ali 12]

• Queuing theory– [Ferner-Medard-Soljanin 12, Joshi-Liu-Soljanin 12, Shah-Lee-

Ramchandran 12]

Erasure Coding for Distributed Storage

Page 5: Coding for Atomic Shared Memory Emulation

• Locality, Repair Bandwidth, Caching and Content Distribution– [Gopalan et. al 2011, Dimakis-Godfrey-Wu-Wainwright- 10, Wu-

Dimakis 09, Niesen-Ali 12]

• Queuing theory– [Ferner-Medard-Soljanin 12, Joshi-Liu-Soljanin 12, Shah-Lee-

Ramchandran 12]

Erasure Coding for Distributed Storage

This talk: Theory of distributed computingConsiderations for storing data that changes

Page 6: Coding for Atomic Shared Memory Emulation

6

Consistency: Value changing, get the “latest” version

Failure tolerance, Low storage costs, Fast reads and writes

Page 7: Coding for Atomic Shared Memory Emulation

7

Shared Memory Emulation - History

Atomic (consistent) shared memory

• [Lamport 1986]• Cornerstone of distributed

computing and multi-processor programming

Page 8: Coding for Atomic Shared Memory Emulation

8

Shared Memory Emulation - History

Atomic (consistent) shared memory

Emulation over distributed storage

systems

• [Lamport 1986]• Cornerstone of distributed

computing and multi-processor programming

• “ABD” algorithm [Attiya-Bar-Noy-Dolev95], 2011 Dijsktra Prize,

• Amazon dynamo key-value store

[Decandia et. al. 2008]• Replication-based

Page 9: Coding for Atomic Shared Memory Emulation

9

Shared Memory Emulation - History

Atomic (consistent) shared memory

Emulation over distributed storage

systems

Costs of emulation

• [Lamport 1986]• Cornerstone of distributed

computing and multi-processor programming

• “ABD” algorithm [Attiya-Bar-Noy-Dolev95], 2011 Dijsktra Prize,

• Amazon dynamo key-value store

[Decandia et. al. 2008]• Replication-based

• Low cost coding based algorithm

• Communication and storage costs

(This talk) • [C-Lynch-Medard-Musial 2014],preprint available

Page 10: Coding for Atomic Shared Memory Emulation

10

Shared Memory Emulation - History

Atomic (consistent) shared memory

Emulation over distributed storage

systems

Costs of emulation

• [Lamport 1986]• Cornerstone of distributed

computing and multi-processor programming

• “ABD” algorithm [Attiya-Bar-Noy-Dolev95], 2011 Dijsktra Prize,

• Amazon dynamo key-value store

[Decandia et. al. 2008]• Replication-based

• Low cost coding based algorithm

• Communication and storage costs

• [C-Lynch-Medard-Musial 2014],preprint available(This talk)

Page 11: Coding for Atomic Shared Memory Emulation

11

Page 12: Coding for Atomic Shared Memory Emulation

12

Write

Readtime

Page 13: Coding for Atomic Shared Memory Emulation

13

Write

Readtime

Page 14: Coding for Atomic Shared Memory Emulation

14

Atomicity [Lamport 86]

aka linearizability. [Herlihy, Wing 90]

Write

Readtime

Page 15: Coding for Atomic Shared Memory Emulation

15

Write

Read

Atomicity [Lamport 86]

aka linearizability. [Herlihy, Wing 90]

time

Page 16: Coding for Atomic Shared Memory Emulation

16

Write

Read

Atomicity [Lamport 86]

aka linearizability. [Herlihy, Wing 90]

time

Page 17: Coding for Atomic Shared Memory Emulation

17

Write

Read

Atomicity [Lamport 86]

aka linearizability. [Herlihy, Wing 90]

time

Atomic

Page 18: Coding for Atomic Shared Memory Emulation

18

Atomic

Not atomic

Write

Read

Atomicity [Lamport 86]

aka linearizability. [Herlihy, Wing 90]

time

time

time

Page 19: Coding for Atomic Shared Memory Emulation

19

Shared Memory Emulation - History

Atomic (consistent) shared memory

Emulation over distributed storage

systems

Costs of emulation

• [Lamport 1986]• Cornerstone of distributed

computing and multi-processor programming

• “ABD” algorithm [Attiya-Bar-Noy-Dolev95], 2011 Dijsktra Prize,

• Amazon dynamo key-value store

[Decandia et. al. 2008]• Replication-based

• Low cost coding based algorithm

• Communication and storage costs

• [C-Lynch-Medard-Musial 2014],preprint available(This talk)

Page 20: Coding for Atomic Shared Memory Emulation

20

• Client server architecture, nodes can fail (no. of server failures is limited)

• Point-to-point reliable links (arbitrary delay).

• Nodes do not know if other nodes fail

• An operation should not have to wait for others to complete

Distributed Storage Model

Servers

Write Clients Read Clients

Page 21: Coding for Atomic Shared Memory Emulation

21

• Client server architecture, nodes can fail (no. of server failures is limited)

• Point-to-point reliable links (arbitrary delay)

• Nodes do not know if other nodes fail

• An operation should not have to wait for others to complete

Distributed Storage Model

Servers

Write Clients Read Clients

Page 22: Coding for Atomic Shared Memory Emulation

22

• Client server architecture, nodes can fail (no. of server failures is limited)

• Point-to-point reliable links (arbitrary delay).

• Nodes do not know if other nodes fail

• An operation should not have to wait for others to complete

Distributed Storage Model

Servers

Write Clients Read Clients

Page 23: Coding for Atomic Shared Memory Emulation

23

Write Clients Read Clients

Servers

Requirements and cost measure

Design write, read and server protocols such that

• Atomicity

• Concurrent operations, no waiting.

Communication overheads: Number of bits sent over links Storage overheads: (Worst-case) server storage costs

Page 24: Coding for Atomic Shared Memory Emulation

24

The ABD algorithm (sketch)

Servers

Write Clients Read Clients

Quorum set: Every majority of server snodes. Any two sets intersect at at least one nodesAlgorithm works if at least one quorum set is available.

Page 25: Coding for Atomic Shared Memory Emulation

25

The ABD algorithm (sketch)

Write:Send time-stamped value to every server; return after receiving sufficeint acks.

Read: Send read query; wait for sufficient responses and return with latest value.

Servers:Store latest value from server; send ackRespond to read request with value

Servers

Write Clients Read Clients

Page 26: Coding for Atomic Shared Memory Emulation

26

The ABD algorithm (sketch)

Write:Send time-stamped value to every server; return after receiving acks from quorum.

Read:: Send read query; wait for sufficient responses and return with latest value.

Servers:Store latest value; send ackRespond to read request with value

Servers

ACK

ACK

ACK

ACK

ACK

ACK

Write Clients Read Clients

Page 27: Coding for Atomic Shared Memory Emulation

27

The ABD algorithm (sketch)QueryQueryQueryQueryQueryQuery

QueryWrite Clients Read Clients

Write:Send time-stamped value to every server; return after receiving sufficeint acks.

Read: Send read query; wait for sufficient responses and return with latest value.

Servers:Store latest value from server; send ackRespond to read request with value

Servers

Page 28: Coding for Atomic Shared Memory Emulation

28

The ABD algorithm (sketch)

Servers

Write:Send time-stamped value to every server; return after receiving sufficeint acks.

Read: Send read query; wait for quorum of responses; return with latest value.

Servers:Store latest value from server; send ackRespond to read request with value

Write Clients Read Clients

Page 29: Coding for Atomic Shared Memory Emulation

29

The ABD algorithm (sketch)

Servers

Write:Send time-stamped value to every server; return after receiving sufficeint acks.

Read: Send read query; wait for quorum responses; send latest value to quourm; latest value.

Servers:Store latest value from server; send ackRespond to read request with value

Write Clients Read Clients

Page 30: Coding for Atomic Shared Memory Emulation

30

The ABD algorithm (sketch)

Servers

Write:Send time-stamped value to every server; return after receiving sufficeint acks.

Read: Send read query; wait for acks from quorum responses; send latest value to servers; return latest value after receiving acks from quorum.Servers:Store latest value from server; send ackRespond to read request with value

Write Clients Read Clients

ACK

ACK ACK

ACK

ACK

ACK

Page 31: Coding for Atomic Shared Memory Emulation

The ABD algorithm (summary)

• The ABD algorithm ensures atomic operations.

• Operations terminate is ensured as long as a majority of nodes do not fail.

• Implication: A networked distributed storage system can be used as shared memory.

• Replication to ensure failure tolerance.

Page 32: Coding for Atomic Shared Memory Emulation

ABD

Storage

Communication(read)

Communication(write)

Performance Analysis

• f represents number of failures• a lower communication cost algorithm in [Fan-Lynch 03]

Page 33: Coding for Atomic Shared Memory Emulation

33

Shared Memory Emulation - History

Atomic (consistent) shared memory

Emulation over distributed storage

systems

Costs of emulation

• [Lamport 1986]• Cornerstone of distributed

computing and multi-processor programming

• “ABD” algorithm [Attiya-Bar-Noy-Dolev95], 2011 Dijsktra Prize,

• Amazon dynamo key-value store

[Decandia et. al. 2008]• Replication-based

• Low cost coding based algorithm

• Communication and storage costs

(This talk)• [C-Lynch-Medard-Musial 2014],

preprint available

Page 34: Coding for Atomic Shared Memory Emulation

Shared Memory Emulation – Erasure Coding

• [Hendricks-Ganger-Reiter 07, Dutta-Guerraoui-Levy 08, Dobre-et.al 13, Androulaki et. al 14]

• New algorithm, a formal analysis of costs

• Outperforms previous algorithms in certain aspects• Previous algorithms incur infinite worst-case storage costs• Previous algorithms incur large communication costs

Page 35: Coding for Atomic Shared Memory Emulation

35

Erasure Coded Shared Memory

Page 36: Coding for Atomic Shared Memory Emulation

36

Erasure Coded Shared Memory

Example:(6,4) MDS code

• Value recoverable from any 4 coded packets

• Size of coded packet is ¼ size of value

Smaller packets,smaller overheads

Page 37: Coding for Atomic Shared Memory Emulation

37

• Value recoverable from any 4 coded packets

• Size of coded packet is ¼ size of value

• New constraint, need 4 packets with same time-stamp

Erasure Coded Shared Memory

Smaller packets,smaller overheads

Example:(6,4) MDS code

Page 38: Coding for Atomic Shared Memory Emulation

38

Quorum set: Every subset of 5 server snodes. Any two sets intersect at 4 nodesAlgorithm works if at least one quorum set is available.

Coded Shared Memory – Quorum set up

Servers

Write Clients Read Clients

Page 39: Coding for Atomic Shared Memory Emulation

39

Coded Shared Memory – Why is it challenging?

Servers

Write Clients Read Clients

Page 40: Coding for Atomic Shared Memory Emulation

40

Coded Shared Memory – Why is it challenging?

Servers

QueryQuery

Query

Query

Challenges: reveal elements to readers only when enough elements are propagated discard old versions safely

Solutions: Write in multiple phases Store all the write-versions concurrent with a read

Servers store multiple versions

Write Clients Read Clients

Page 41: Coding for Atomic Shared Memory Emulation

Coded Shared Memory – Protocol overview

Write:Send time-stamped value to every server; send finalize message after getting acks from quorum; return after receiving acks from quorum.

Read: Send read query; wait for time-stamps from a quorum;Send request with latest time-stamp to servers; decode and return value after receiving acks from quorum.

Servers:Store the coded symbol; keep latest δ codeword symbols and delete older ones; send ack. Set finalize flag for tag on receiving finalize message.Respond to read query with latest finalized tag.Finalize the requested tag; respond to read request with codeword symbol.

Page 42: Coding for Atomic Shared Memory Emulation

Coded Shared Memory – Protocol overview

Write:Send time-stamped value to every server; send finalize message after getting acks from quorum; return after receiving acks from quorum.

Read: Send read query; wait for time-stamps from a quorum;Send request with latest time-stamp to servers; decode and return value after receiving acks from quorum.

Servers:Store the coded symbol; keep latest δ codeword symbols and delete older ones; send ack. Set finalize flag for time-stamp on receiving finalize message. Send ack.Respond to read query with latest finalized tag.Finalize the requested tag; respond to read request with codeword symbol.

Page 43: Coding for Atomic Shared Memory Emulation

Coded Shared Memory – Protocol overview

Write:Send time-stamped value to every server; send finalize message after getting acks from quorum; return after receiving acks from quorum.

Read: Send read query; wait for time-stamps from a quorum;Send request with latest time-stamp to servers; decode and return value after receiving acks from quorum.

Servers:Store the coded symbol; keep latest δ codeword symbols and delete older ones; send ack. Set finalize flag for tag on receiving finalize message.Respond to read query with latest finalized tag.Finalize the requested tag; respond to read request with codeword symbol.

Page 44: Coding for Atomic Shared Memory Emulation

Coded Shared Memory – Protocol overview

Write:Send time-stamped value to every server; send finalize message after getting acks from quorum; return after receiving acks from quorum.

Read: Send read query; wait for time-stamps from a quorum;Send request with latest time-stamp to servers; decode and return value after receiving acks/symbols from quorum.

Servers:Store the coded symbol; keep latest δ codeword symbols and delete older ones; send ack. Set finalize flag for tag on receiving finalize message.Respond to read query with latest finalized tag.Finalize the requested time-stamp; respond to read request with codeword symbol if it exists, else send ack.

Page 45: Coding for Atomic Shared Memory Emulation

Coded Shared Memory – Protocol overview

Write:Send time-stamped value to every server; send finalize message after getting acks from quorum; return after receiving acks from quorum.

Read: Send read query; wait for time-stamps from a quorum;Send request with latest time-stamp to servers; decode and return value after receiving acks/symbols from quorum.

Servers:Store the coded symbol; keep latest δ codeword symbols and delete older ones; send ack. Set finalize flag for time-stamp on receiving finalize message.Respond to read query with latest finalized tag.Finalize the requested time-stamp; respond to read request with codeword symbol if it exists, else send ack.

Page 46: Coding for Atomic Shared Memory Emulation

Coded Shared Memory – Protocol overview

• Use (N,k) MDS code, where N is the number of servers

• Ensures atomic operations

• Operations terminate is ensured as long as o Number of failed nodes smaller than (N-k)/2o Number of writes concurrent with a read

smaller than δ

Page 47: Coding for Atomic Shared Memory Emulation

Performance comparisons

ABD Our Algorithm

Storage

Communication(read)

Communication(write)

• N represents number of nodes, f represents number of failures• δ represents maximum number of writes concurrent with a read

Page 48: Coding for Atomic Shared Memory Emulation

48

Proof Steps

• After every operation terminates, - there is a quorum of servers with the codeword symbol - there is a quorum of servers with the finalize label - because every pair of servers intersects in k servers,

readers can decode the value

Page 49: Coding for Atomic Shared Memory Emulation

49

Proof Steps

• After every operation terminates, - there is a quorum of servers with the codeword symbol - there is a quorum of servers with the finalize label - because every pair of servers intersects in k servers,

readers can decode the value

• When a codeword symbol is deleted at a server– Every operation that wants that time-stamp has terminated– (Or the concurrency bound is violated)

Page 50: Coding for Atomic Shared Memory Emulation

50

Main Insights

• Significant savings on network traffic overheads

- Reflects the classical gain of erasure coding over replication

• (New Insight) Storage overheads depend on client activity• Storage overhead proportional to the no. of writes concurrent

with a read• Better than classical techniques for moderate client activity

Page 51: Coding for Atomic Shared Memory Emulation

51

Future Work – Many open questions

Refinements of our algorithm- (Ongoing) More robustness to client node failures

Information theoretic bounds on costs- New coding schemes

Finer network models- Erasure channels, different topologies, wireless channels

Finer source models- Correlations across versions

Dynamic networks

Page 52: Coding for Atomic Shared Memory Emulation

52

Future Work – Many open questions

Refinements of our algorithm- (Ongoing) More robustness to client node failures

Information theoretic bounds on costs- New coding schemes

Finer network models- Erasure channels, different topologies, wireless channels

Finer source models- Correlations across versions

Dynamic networks

Page 53: Coding for Atomic Shared Memory Emulation

53

Storage costs

ABD

Our algorithm

Number of writes concurrent with a read

Storage Overhead

What is the fundamental cost

curve?

Page 54: Coding for Atomic Shared Memory Emulation

54

Future Work – Many open questions

Refinements of our algorithm- (Ongoing) More robustness to client node failures

Information theoretic bounds on costs- New coding schemes

Finer network models, finer source models- Erasure channels, different topologies, wireless channels- Correlations across versions

Dynamic networks

Page 55: Coding for Atomic Shared Memory Emulation

55

Future Work – Many open questions

Refinements of our algorithm- (Ongoing) More robustness to client node failures

Information theoretic bounds on costs- New coding schemes

Finer network models, finer source models- Erasure channels, different topologies, wireless channels- Correlations across versions

Dynamic networks

- Interesting replication based algorithm in [Gilbert-Lynch-Shvartsman 03]

- Study of costs in terms of network dynamics