efficient solution to replicated log and dictionary problem. (gene t.j. wuu & arthur j....
DESCRIPTION
Need for replicated data? Many applications share data objects. Reliability and fast access are in demand. First step toward a comprehensive disaster recovery plan. Availability of data even when individual node fails. 1/21TRANSCRIPT
EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM.
(Gene T.J. Wuu & Arthur J. Bernstein.)
Presented By : Megha Priyanka
Overview
• Need For Data Replication.• Consistency Constraints For Replicated Data.• Model Of The Distributed Environment.• Dictionary And Log Structure.• Dictionary Problem.• Prior Work.• Proposed Solution.• Comparison With Other Work.• 2DTT Data Structure Improvement.• Extending The Proposed Solution.• Conclusion.
Need for replicated data?
• Many applications share data objects.
• Reliability and fast access are in demand.
• First step toward a comprehensive disaster recovery plan.
• Availability of data even when individual node fails.
1/21
Consistency constraints for replicated data….
• Serializable transactionsSerializable transactions ensure correctness of database. ensure correctness of database.• Serial consistency is harder in unreliable distributed system.Serial consistency is harder in unreliable distributed system.
Why?Why?
-> Availability conflicts with serial consistency.-> Availability conflicts with serial consistency.-> Concurrency and serializability are compatible when concurrent -> Concurrency and serializability are compatible when concurrent
transactions access disjoint databases. transactions access disjoint databases.
So,So,
• Lower the consistency bar.Lower the consistency bar.• Use a weaker consistency constraint with additional information about the Use a weaker consistency constraint with additional information about the
distributed transaction. distributed transaction.
2/21
Event Model
n1n2
n5
n3
n6n4
Send(m,T6,6)
receive
Non-communication event
Local DataIntact!
crashed
Uses Lamport totalordering and
happened-before concept.
3/21
Distributed Dictionary
Data Replication needs an efficient data structure ---scalable, available and recoverable.Solution is…..A replicated dictionary using log
Dictionary: An abstraction of data object like file directory, a resource management table, an electronicappointment calendar.
XIndex DeleteInsert
11
4/21
A Dictionary Snapshot.
5/21
Distributed Log
Data Structure:type Event = record op: OperationType; time : TimeType; node : NodeId; endExample:1. delete, Ti, 3.2. add , Ti+4, 6.
6/21
DICTIONARY PROBLEMNOTATION:•Each node has a local, fully replicated dictionary copy Vi.
• V(e) = Contents of node where e occurred.• X = Dictionary Entry.•.CX = Event that inserts X .• X-delete event = Event that deletes X.Dictionary Problem Restrictions.R1) X є V(e) iff CX -> e with no X-delete event g, g -> E.R2) Delete(X) can be invoked on Ni only if X є Vi immediately prior to execution .R3) For each dictionary entry X, there is almost one event, insert(X) in the dictionary.Dictionary Problem:Problem of finding distributed algorithm on n nodes such that each node can do insert/delete/send/receive subjected to restrictions R1,R2 AND R3.
TInsert x Delete x e
7/21
Prior Work
P1 P2
P31) XINSERT
X
L2L1
INSERTX
INSERTX
SENDS WHOLE LOGEXCESSIVE COMMUNICATION
1) Y2) X
USED TO CALCULATE DICTIONARYENTRY.
Y є V(e) iff CY -> e WITH NO X-DELETE EVENT g, g -> E
EXCESSIVE CALCULATION
ENTIRE LOG STOREDEXCESSIVE STORAGE COST.
1) Y
8/21
Dictionary
Log
Proposed Solution is…
Data Structures Used:
Log Data Structure:• 2-D Time Table Ti (Remember Matrix Timestamp)• Partial Log PLi
Dictionary Data Structure:• Vi : Set Of Dictionary Entries.
9/21
Algorithm Initialization:Vi = 0; PLi = 0; For all (i,j) Ti[i,j] = 0
Insert(X)/ Delete(X):• Ti[i,i] = Clocki.• PLi = PLi U { Op,Ti[i,i],i} If Op = Insert(X), Vi = Vi U {X}. If Op = Delete(X), Vi = Vi – {X}.
Send(m) To Nk:
• NP = {eR , (eR є PLi) & ( Ni knows that Nk doesn’t know about eR with 2DTT = Ti at node Ni).• SEND <NP, Ti> TO Nk.
Receive(m) From Nk:
• m = < NPk, Tk >• NE = Msg to include = those records of which Ni isn’t aware of.
• Vi = {V | (V є Vi or insertion of V є NE) AND (V hasn’t being deleted from NE ).}• Update Ti using same concept as matrix timestamp.• PLi = {eR , the event belongs to PLi U NE & if at most one node has no info about eR}.
10/21
n2
n1n3
Insert(X,1,1)
T1T3
T2
00 00 0000 00 0000 00 00
22 00 0000 00 0000 00 00
00 00 00
00 00 00
00 00 00
<Insert x , T1>
<Insert x , T1>
log dictionary
log dictionary
1 x
dictionary log
11/21
n2
n1n3
Insert (X,1,1)Insert (X, 1,3)
Insert(X,1,2)
T1T3
T2
22 00 0022 11 0000 00 00
22 00 0000 00 0000 00 00
22 00 00
00 00 00
22 00 11
<Insert x , T1>
<Insert x , T1>
log dictionary
dictionarylog
1 x
dictionary log
1 x
1 x
12/21
n2
n1n3
Insert (X,1,1)Insert (X, 1,3)
Insert (X,1,2)Insert(Y,2,2)
T1T3
T2
22 00 0022 33 0000 00 00
22 00 0000 00 0000 00 00
22 00 00
00 00 00
22 00 11
<(Insert x, Insert y ), T2><Insert y , T2>
log dictionary
dictionarylog
1 x
dictionary log
1 x
1 x
2 y
13/21
n2
n1n3
Insert(X,1,1)Insert(Y,3,1)Insert (X, 1,3)
Insert (Y,2,3)
Insert(X,1,2)Insert(Y,3,2)
T1T3
T2
22 00 0022 33 0000 00 00
33 33 0022 33 0000 00 00
22 00 00
22 33 00
22 00 22
<(Insert x, Insert y ), T2><Insert x , T2>
log dictionary
dictionarylog
1 x
dictionary log
1 x
1 x
2 y
2 y2 y
14/21
n2
n1n3
Insert (X,1,1)Insert( Y,3,1)Insert (Y,2,3)
Insert (z,3,3)
Insert (X,1,2)Insert (Y,3,2)
T1T3
T2
22 00 0022 33 0000 00 00
33 33 0022 33 0000 00 00
22 00 00
22 33 00
22 00 44
log dictionary
dictionarylog
1 x
dictionary log
1 x
1 x
2 y
2 y2 y
<(insert z, insert y ), T3>
<insert z , T3>
15/21
n2
n1n3
Insert(X,1,1)Insert(Y,3,1)Insert(Z,4,1)Insert( Y,2,3)
Insert (z,4,3)
Insert(X,1,2)Insert(Y,3,2)Insert(Z,4,2)
T1T3
T2
22 00 0022 44 0022 00 44
44 33 4422 33 0022 00 44
22 00 00
22 33 00
22 00 44
log dictionary
dictionarylog
1 x
Dictionary log
1 x
1 x2 y
2 y
2 y
<(insert z, insert y), T3>
<(insert z ), T3>
3 z
3 z
16/21
n2
n1n3
Insert(Y,3,1)Insert(Z,4,1)Insert( Y,2,3)
Insert (z,4,3)
Insert(Y,3,2)Insert(Z,4,2)
T1T3
T2
22 00 0022 44 0022 00 44
66 33 4422 33 0022 00 44
22 00 00
22 33 00
22 00 44
log dictionary
dictionarylog
1 x
dictionary log
1 x
1 x2 y
2 y2 y
<(del x) T1>
3 z
3 z
<(del x, insert z ), T1>
17/21
3 z
Comparison with other work
Proposed By:Proposed By: Data Structure used:Data Structure used: Disadvantage :Disadvantage :
Fisher and MichaelFisher and Michael Dictionary data Dictionary data structures. structures.
Have to send entire Have to send entire copy of the copy of the dictionary in each dictionary in each message.message.
AllchinAllchin Synchronization set Synchronization set (SS) and 1-D Time (SS) and 1-D Time Table.Table.SS ~= Partial LogSS ~= Partial Log
SS grows SS grows unboundedly.unboundedly.
Wuu & BernsteinWuu & Bernstein Dictionary, Log and Dictionary, Log and 2-D Time Table2-D Time Table
2-DTT of message 2-DTT of message complexity = O(ncomplexity = O(n22).).is sent in every is sent in every message.message.
18/21
Improving 2-DTT Message Complexity
StrategyStrategy Data Structure Stored/Sent.Data Structure Stored/Sent. Pros & Cons.Pros & Cons.
00 Complete 2DTT is stored at the node Complete 2DTT is stored at the node Complete 2DTT is sent in the message.Complete 2DTT is sent in the message.
Message Complexity is Message Complexity is as high as O(nas high as O(n22), as one ), as one has to send and store n has to send and store n x n matrix.x n matrix.
11 Complete 2DTT is stored at the node.Complete 2DTT is stored at the node.A node sends only its own row in the A node sends only its own row in the message.message.
Requires direct Requires direct messages to update messages to update each row. Needs to each row. Needs to include more event include more event records. records.
22 Stores neighbors’ and own rows.Stores neighbors’ and own rows.Sends corresponding row info. to Sends corresponding row info. to corresponding neighbor. corresponding neighbor.
Can’t determine when Can’t determine when all nodes have come to all nodes have come to know about an event.know about an event.Discard event record Discard event record once all neighbors know once all neighbors know about it. about it.
33 Stores all entries (row & column) Stores all entries (row & column) corresponding to neighbors.corresponding to neighbors.Sends row info. thorough the gateway Sends row info. thorough the gateway nodes.nodes.
Better when n/w is large Better when n/w is large , connectivity and , connectivity and communication are less. communication are less.
Store: O(n2)Send: O(n2)
Store: O(n2)Send: O(n)
Store: O(nk)Send: O(n)
Stores: O(k2)Send:O(k)
19/21
Extending The Proposed Solution….
Replicated Numeric Data:It supports add-to and subtract-from operations, that are commutative. Log/2DTT solution makes sure that no matter what order one does the operation, the answer is consistent.So, result1 = b + a –c; result2 = b – c + a;result1 = result2.Detection Of Failure :To distinguish node failure from communication failure, a log is used to collect records of communication events. Suppose node N1 has the 2DTT as1 0 00 0 01 0 3It knows that no one has received any info from Node 2. So, node 2 might be down.
20/21
Conclusion
•Mutual consistency of replicated data is achieved.•Algorithm works well in an unreliable network.•Weaker Consistency Constraint is used.•Excessive communication, computation and storage costs are reduced.
Remember Replicated Log used to compute others’
views of data.
Link failure/Message lost: Get info from other nodes.
Node failure: Info stored in log/dictionary that are stable
storages.
Reduction of comm / storagecost:
Partial log sent and storedReduction of computation
cost:Partial entries re-calculated
in the dictionary
21/21