consistency without consensus linearizable resilient data types (lrdt)
DESCRIPTION
Consistency without consensus Linearizable Resilient Data Types (LRDT). Kaushik Rajan Sagar Chordia Kapil Vaswani Ganesan Ramalingam Sriram Rajamani. Consistency & consensus. Add(The Hobbit). GetCart(). No deterministic algorithm in the presence of failures [FLP]. Add(Kindle). - PowerPoint PPT PresentationTRANSCRIPT
Consistency without consensusLinearizable Resilient Data Types (LRDT)
Kaushik RajanSagar Chordia Kapil Vaswani
Ganesan RamalingamSriram Rajamani
Consistency & consensusAdd(The Hobbit)
Add(Kindle)
GetCart()
Processes agree on ordering of operations
GetCart()
No deterministic algorithm in the presence
of failures [FLP]
Commuting updates• What if all update operations commute?– Ordering of updates doesn’t matter!– Eventual consistency reduces to eventual message delivery– Single round trip latency
• What if we desire linearizability?– Updates don’t commute with arbitrary reads – Reads must be consistently ordered with updates– Semantics of queries like the current top(k) elements well
understood
Commuting updatesAdd(The Hobbit)
Add(Kindle)
GetCart()
GetCart()
{}
{The Hobbit, Kindle}
Reads must observe comparable sets of operations
Linearizable resilient data typesPossible Impossible
Don’t know
S S’
op1
op2op1
op2
P1 : commutes(s,op1,op2)
op1
op2
S
S1
S2
op1
P2 : nullify(s,op1,op2)
op2
S
S1
S2
op2
op1
Examples• Read write register :
every pair of writes nullify• Read write memory :
writes to the same location nullify, writes to different locations commute
Examples• Set : add, remove and read the whole set– Add(u), Remove(v) commute– Add(u), Remove(u) nullify – Add(*), Add(*) commute– Remove(*) Remove(*) commute
• Counter : IncrBy(x), DecrBy(x), SetTo(v), Read()– SetTo(v) nullifies all other operations– Other pairs of updates commute
• Other examples Heaps, union-find, atomic snapshot objects…
Lattice agreement• Consistency reduces to lattice agreement– Weaker problem than consensus– Solvable in an asynchronous distributed system
• Assumptions– t < n/2 failures– Eventual message delivery
Lattice agreement• processes, each process starts with a value belonging
to a join semi lattice• Each non-faulty process outputs a value– (Validity) Each process’ output is a join of one or more input
values including its own– (Consistency) Any two output values are comparable– (Liveness) Every correct process eventually outputs a value
Lattice agreement
{}
{𝑎} {𝑏 } {𝑐 }
{𝑎 ,𝑏 } {𝑏 ,𝑐 } {𝑎 ,𝑐 }
{𝑎 ,𝑏 ,𝑐 }
𝑝1 𝑝2
𝑝3𝑝2
𝑝3𝑝2
𝑝1
a = Add(The Hobbit)b = Add(Kindle)c = Add(Lumia)
Send to all acceptors
All Acks
?
Output
𝑣 𝑖←⋁ ∀ 𝑁𝑎𝑐𝑘 (𝑎 𝑗 )𝑎 𝑗
wait for majority of acceptors to respond
On receiving
𝑎𝑖≤𝑣 𝑗
S S
Y
N
Y N
PROPOSERS ACCEPTORSInitially
𝑎𝑖=𝑎𝑖∨𝑣 𝑗 𝑎𝑖=𝑎𝑖∨𝑣 𝑗
Safety and liveness• Safety always guaranteed• Lattice agreement is t-resilient – Liveness guaranteed if quorum of processes are non-faulty
and communication is reliable– Processes output value in at-most n round trips, where n is
the number of processes
Generalized lattice agreement• Generalization of lattice agreement – Processes receive sequence of values– Values belong to an infinite lattice
• Processes output a sequence of values– (Validity) Every output value is a join of some received values – (Consistency) Any two output values are comparable (i.e.
output values form a chain)– (Liveness) Every value received by a correct process is
eventually included in an output value
GLA algorithm• Liveness (t-resilient)– Every received value is eventually included in some output in
n round trips– Adaptive, complexity depends on contention
• Fast path – Received values output in one round trip
• Reconfigurable – Replicas can be added/removed dynamically
From GLA to linearizability• Update commands form power set lattice• Updates return once majority of processes have learnt a
command set that includes the update command• Read performed by (ABD style algorithm)
1. reading the learnt command set from a quorum of processes2. Writing back the largest among these to a quorum3. Constructing state corresponding to the largest command set
by exploiting commutativity and nullification• Multi-master replication– Does not require a single primary/leader
Impossibility
• Consensus reductionConsensus(b)
Si S0
if(b) then op1 else op2s = read()if(s = S1,S12) return
trueelse return false
Pair of idempotent update operations that neither commute nor nullify at some state s0
S0
S1S1
2
S2S2
1
op2
op1
op1
op2
Si
Op*
op2
op1
Implications for designing ADTsMost commands commute
Implications for designing ADTs
neither commute nor nullify at
;
The Gap : Open problems Doubly saturating counter
0 1Incr()
Decr()
2Incr()
Decr()
nIncr()
Decr()Decr()
Incr()
Incr() and Decr() commute at 1 … n-1Incr() and Dect() nullify at 0 and n
Don’t know if this is possible or impossible
Summary
graph, RW mem… queues, sequences
Possible Impossible??Saturating
counter