efficient solution to replicated log and dictionary problem. (gene t.j. wuu & arthur j....

24
EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Upload: nora-may

Post on 18-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

Need for replicated data? Many applications share data objects. Reliability and fast access are in demand. First step toward a comprehensive disaster recovery plan. Availability of data even when individual node fails. 1/21

TRANSCRIPT

Page 1: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM.

(Gene T.J. Wuu & Arthur J. Bernstein.)

Presented By : Megha Priyanka

Page 2: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Overview

• Need For Data Replication.• Consistency Constraints For Replicated Data.• Model Of The Distributed Environment.• Dictionary And Log Structure.• Dictionary Problem.• Prior Work.• Proposed Solution.• Comparison With Other Work.• 2DTT Data Structure Improvement.• Extending The Proposed Solution.• Conclusion.

Page 3: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Need for replicated data?

• Many applications share data objects.

• Reliability and fast access are in demand.

• First step toward a comprehensive disaster recovery plan.

• Availability of data even when individual node fails.

1/21

Page 4: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Consistency constraints for replicated data….

• Serializable transactionsSerializable transactions ensure correctness of database. ensure correctness of database.• Serial consistency is harder in unreliable distributed system.Serial consistency is harder in unreliable distributed system.

Why?Why?

-> Availability conflicts with serial consistency.-> Availability conflicts with serial consistency.-> Concurrency and serializability are compatible when concurrent -> Concurrency and serializability are compatible when concurrent

transactions access disjoint databases. transactions access disjoint databases.

So,So,

• Lower the consistency bar.Lower the consistency bar.• Use a weaker consistency constraint with additional information about the Use a weaker consistency constraint with additional information about the

distributed transaction. distributed transaction.

2/21

Page 5: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Event Model

n1n2

n5

n3

n6n4

Send(m,T6,6)

receive

Non-communication event

Local DataIntact!

crashed

Uses Lamport totalordering and

happened-before concept.

3/21

Page 6: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Distributed Dictionary

Data Replication needs an efficient data structure ---scalable, available and recoverable.Solution is…..A replicated dictionary using log

Dictionary: An abstraction of data object like file directory, a resource management table, an electronicappointment calendar.

XIndex DeleteInsert

11

4/21

Page 7: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

A Dictionary Snapshot.

5/21

Page 8: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Distributed Log

Data Structure:type Event = record op: OperationType; time : TimeType; node : NodeId; endExample:1. delete, Ti, 3.2. add , Ti+4, 6.

6/21

Page 9: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

DICTIONARY PROBLEMNOTATION:•Each node has a local, fully replicated dictionary copy Vi.

• V(e) = Contents of node where e occurred.• X = Dictionary Entry.•.CX = Event that inserts X .• X-delete event = Event that deletes X.Dictionary Problem Restrictions.R1) X є V(e) iff CX -> e with no X-delete event g, g -> E.R2) Delete(X) can be invoked on Ni only if X є Vi immediately prior to execution .R3) For each dictionary entry X, there is almost one event, insert(X) in the dictionary.Dictionary Problem:Problem of finding distributed algorithm on n nodes such that each node can do insert/delete/send/receive subjected to restrictions R1,R2 AND R3.

TInsert x Delete x e

7/21

Page 10: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Prior Work

P1 P2

P31) XINSERT

X

L2L1

INSERTX

INSERTX

SENDS WHOLE LOGEXCESSIVE COMMUNICATION

1) Y2) X

USED TO CALCULATE DICTIONARYENTRY.

Y є V(e) iff CY -> e WITH NO X-DELETE EVENT g, g -> E

EXCESSIVE CALCULATION

ENTIRE LOG STOREDEXCESSIVE STORAGE COST.

1) Y

8/21

Dictionary

Log

Page 11: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Proposed Solution is…

Data Structures Used:

Log Data Structure:• 2-D Time Table Ti (Remember Matrix Timestamp)• Partial Log PLi

Dictionary Data Structure:• Vi : Set Of Dictionary Entries.

9/21

Page 12: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Algorithm Initialization:Vi = 0; PLi = 0; For all (i,j) Ti[i,j] = 0

Insert(X)/ Delete(X):• Ti[i,i] = Clocki.• PLi = PLi U { Op,Ti[i,i],i} If Op = Insert(X), Vi = Vi U {X}. If Op = Delete(X), Vi = Vi – {X}.

Send(m) To Nk:

• NP = {eR , (eR є PLi) & ( Ni knows that Nk doesn’t know about eR with 2DTT = Ti at node Ni).• SEND <NP, Ti> TO Nk.

Receive(m) From Nk:

• m = < NPk, Tk >• NE = Msg to include = those records of which Ni isn’t aware of.

• Vi = {V | (V є Vi or insertion of V є NE) AND (V hasn’t being deleted from NE ).}• Update Ti using same concept as matrix timestamp.• PLi = {eR , the event belongs to PLi U NE & if at most one node has no info about eR}.

10/21

Page 13: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

n2

n1n3

Insert(X,1,1)

T1T3

T2

00 00 0000 00 0000 00 00

22 00 0000 00 0000 00 00

00 00 00

00 00 00

00 00 00

<Insert x , T1>

<Insert x , T1>

log dictionary

log dictionary

1 x

dictionary log

11/21

Page 14: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

n2

n1n3

Insert (X,1,1)Insert (X, 1,3)

Insert(X,1,2)

T1T3

T2

22 00 0022 11 0000 00 00

22 00 0000 00 0000 00 00

22 00 00

00 00 00

22 00 11

<Insert x , T1>

<Insert x , T1>

log dictionary

dictionarylog

1 x

dictionary log

1 x

1 x

12/21

Page 15: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

n2

n1n3

Insert (X,1,1)Insert (X, 1,3)

Insert (X,1,2)Insert(Y,2,2)

T1T3

T2

22 00 0022 33 0000 00 00

22 00 0000 00 0000 00 00

22 00 00

00 00 00

22 00 11

<(Insert x, Insert y ), T2><Insert y , T2>

log dictionary

dictionarylog

1 x

dictionary log

1 x

1 x

2 y

13/21

Page 16: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

n2

n1n3

Insert(X,1,1)Insert(Y,3,1)Insert (X, 1,3)

Insert (Y,2,3)

Insert(X,1,2)Insert(Y,3,2)

T1T3

T2

22 00 0022 33 0000 00 00

33 33 0022 33 0000 00 00

22 00 00

22 33 00

22 00 22

<(Insert x, Insert y ), T2><Insert x , T2>

log dictionary

dictionarylog

1 x

dictionary log

1 x

1 x

2 y

2 y2 y

14/21

Page 17: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

n2

n1n3

Insert (X,1,1)Insert( Y,3,1)Insert (Y,2,3)

Insert (z,3,3)

Insert (X,1,2)Insert (Y,3,2)

T1T3

T2

22 00 0022 33 0000 00 00

33 33 0022 33 0000 00 00

22 00 00

22 33 00

22 00 44

log dictionary

dictionarylog

1 x

dictionary log

1 x

1 x

2 y

2 y2 y

<(insert z, insert y ), T3>

<insert z , T3>

15/21

Page 18: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

n2

n1n3

Insert(X,1,1)Insert(Y,3,1)Insert(Z,4,1)Insert( Y,2,3)

Insert (z,4,3)

Insert(X,1,2)Insert(Y,3,2)Insert(Z,4,2)

T1T3

T2

22 00 0022 44 0022 00 44

44 33 4422 33 0022 00 44

22 00 00

22 33 00

22 00 44

log dictionary

dictionarylog

1 x

Dictionary log

1 x

1 x2 y

2 y

2 y

<(insert z, insert y), T3>

<(insert z ), T3>

3 z

3 z

16/21

Page 19: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

n2

n1n3

Insert(Y,3,1)Insert(Z,4,1)Insert( Y,2,3)

Insert (z,4,3)

Insert(Y,3,2)Insert(Z,4,2)

T1T3

T2

22 00 0022 44 0022 00 44

66 33 4422 33 0022 00 44

22 00 00

22 33 00

22 00 44

log dictionary

dictionarylog

1 x

dictionary log

1 x

1 x2 y

2 y2 y

<(del x) T1>

3 z

3 z

<(del x, insert z ), T1>

17/21

3 z

Page 20: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Comparison with other work

Proposed By:Proposed By: Data Structure used:Data Structure used: Disadvantage :Disadvantage :

Fisher and MichaelFisher and Michael Dictionary data Dictionary data structures. structures.

Have to send entire Have to send entire copy of the copy of the dictionary in each dictionary in each message.message.

AllchinAllchin Synchronization set Synchronization set (SS) and 1-D Time (SS) and 1-D Time Table.Table.SS ~= Partial LogSS ~= Partial Log

SS grows SS grows unboundedly.unboundedly.

Wuu & BernsteinWuu & Bernstein Dictionary, Log and Dictionary, Log and 2-D Time Table2-D Time Table

2-DTT of message 2-DTT of message complexity = O(ncomplexity = O(n22).).is sent in every is sent in every message.message.

18/21

Page 21: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Improving 2-DTT Message Complexity

StrategyStrategy Data Structure Stored/Sent.Data Structure Stored/Sent. Pros & Cons.Pros & Cons.

00 Complete 2DTT is stored at the node Complete 2DTT is stored at the node Complete 2DTT is sent in the message.Complete 2DTT is sent in the message.

Message Complexity is Message Complexity is as high as O(nas high as O(n22), as one ), as one has to send and store n has to send and store n x n matrix.x n matrix.

11 Complete 2DTT is stored at the node.Complete 2DTT is stored at the node.A node sends only its own row in the A node sends only its own row in the message.message.

Requires direct Requires direct messages to update messages to update each row. Needs to each row. Needs to include more event include more event records. records.

22 Stores neighbors’ and own rows.Stores neighbors’ and own rows.Sends corresponding row info. to Sends corresponding row info. to corresponding neighbor. corresponding neighbor.

Can’t determine when Can’t determine when all nodes have come to all nodes have come to know about an event.know about an event.Discard event record Discard event record once all neighbors know once all neighbors know about it. about it.

33 Stores all entries (row & column) Stores all entries (row & column) corresponding to neighbors.corresponding to neighbors.Sends row info. thorough the gateway Sends row info. thorough the gateway nodes.nodes.

Better when n/w is large Better when n/w is large , connectivity and , connectivity and communication are less. communication are less.

Store: O(n2)Send: O(n2)

Store: O(n2)Send: O(n)

Store: O(nk)Send: O(n)

Stores: O(k2)Send:O(k)

19/21

Page 22: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Extending The Proposed Solution….

Replicated Numeric Data:It supports add-to and subtract-from operations, that are commutative. Log/2DTT solution makes sure that no matter what order one does the operation, the answer is consistent.So, result1 = b + a –c; result2 = b – c + a;result1 = result2.Detection Of Failure :To distinguish node failure from communication failure, a log is used to collect records of communication events. Suppose node N1 has the 2DTT as1 0 00 0 01 0 3It knows that no one has received any info from Node 2. So, node 2 might be down.

20/21

Page 23: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Conclusion

•Mutual consistency of replicated data is achieved.•Algorithm works well in an unreliable network.•Weaker Consistency Constraint is used.•Excessive communication, computation and storage costs are reduced.

Remember Replicated Log used to compute others’

views of data.

Link failure/Message lost: Get info from other nodes.

Node failure: Info stored in log/dictionary that are stable

storages.

Reduction of comm / storagecost:

Partial log sent and storedReduction of computation

cost:Partial entries re-calculated

in the dictionary

21/21

Page 24: EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka