reconciling differences: towards a theory of cloud complexity george varghese ucsd, visiting at...
TRANSCRIPT
Reconciling Differences: towards a theory of cloud complexity
George VargheseUCSD, visiting at Yahoo! Labs
1
2
Part 1: Reconciling Sets across a link
Joint with D. Eppstein, M. Goodrich, F. Uyeda
Appeared in SIGCOMM 2011
3
Motivation 1: OSPF Routing (1990)
• After partition forms and heals, R1 needs updates at R2 that arrived during partition.
R1 R2
Must solve the Set-Difference Problem!
Partition heals
4
Motivation 2:Amazon S3 storage (2007)
• Synchronizing replicas.
S1 S2
Set-Difference across cloud again!
Periodic Anti-entropy Protocol between replicas
5
What is the Set-Difference problem?
• What objects are unique to host 1?• What objects are unique to host 2?
A
Host 1 Host 2
CAFEB D F
6
Use case 1: Data Synchronization
• Identify missing data blocks• Transfer blocks to synchronize sets
A
Host 1 Host 2
CAFEB D F
DC
B E
7
Use case 2: Data De-duplication
• Identify all unique blocks.• Replace duplicate data with pointers
A
Host 1 Host 2
CAFEB D F
8
Prior work versus ours• Trade a sorted list of keys. – Let n be size of sets, U be size of key space– O(n log U) communication, O(n log n) computation– Bloom filters can improve to O(n) communication.
• Polynomial Encodings (Minsky ,Trachtenberg)– Let “d” be the size of the difference– O(d log U) communication, O(dn+d3) computation
• Invertible Bloom Filter (our result)– O(d log U) communication, O(n+d) computation
9
Difference Digests
• Efficiently solves the set-difference problem.• Consists of two data structures:– Invertible Bloom Filter (IBF)• Efficiently computes the set difference.• Needs the size of the difference
– Strata Estimator• Approximates the size of the set difference.• Uses IBF’s as a building block.
10
IBFs: main idea
• Sum over random subsets: Summarize a set by “checksums” over O(d) random subsets.
• Subtract: Exchange and subtract checksums.• Eliminate: Hashing for subset choice
common elements disappear after subtraction• Invert fast: O(d) equations in d unknowns;
randomness allows expected O(d) inversion.
11
“Checksum” details
• Array of IBF cells that form “checksum” words– For set difference of size d, use αd cells (α > 1)
• Each element ID is assigned to many IBF cells• Each cell contains:
idSum XOR of all IDs assigned to cellhashSum XOR of hash(ID) of IDs assigned to cellcount Number of IDs assigned to cell
12
IBF EncodeA
idSum ⊕ AhashSum ⊕ H(A)
count++
idSum ⊕ AhashSum ⊕
H(A)count++
idSum ⊕ A
hashSum ⊕H(A)
count++
Hash1 Hash2 Hash3
B C
Assign ID to many cells
IBF:
αd “Add” ID to cellNot O(n), like
Bloom Filters!
All hosts use the same hash functions
13
Invertible Bloom Filters (IBF)
• Trade IBF’s with remote host
A
Host 1 Host 2
CAFEB D F
IBF 2IBF 1
14
Invertible Bloom Filters (IBF)
• “Subtract” IBF structures– Produces a new IBF containing only unique objects
A
Host 1 Host 2
CAFEB D F
IBF 2
IBF 1
IBF (2 - 1)
15
IBF Subtract
Disappearing act
• After subtraction, elements common to both sets disappear because:– Any common element (e.g W) is assigned to same cells on
both hosts (same hash functions on both sides)– On subtraction, W XOR W = 0. Thus, W vanishes.
• While elements in set difference remain, they may be randomly mixed need a decode procedure.
16
17
IBF Decode
H(V X Z)⊕ ⊕
≠H(V) H(X) ⊕ ⊕
H(Z)
Test for Purity:H( idSum )H( idSum ) = hashSumH(V) = H(V)
18
IBF Decode
19
IBF Decode
20
IBF Decode
21
Small Diffs:1.4x – 2.3x
Large Differences:1.25x - 1.4x
How many IBF cells?Sp
ace
Ove
rhea
d
Set Difference
Hash Cnt 3Hash Cnt 4
Overhead to decode at >99%
α
How many hash functions?
• 1 hash function produces many pure cells initially but nothing to undo when an element is removed.
22
A B
C
How many hash functions?
• 1 hash function produces many pure cells initially but nothing to undo when an element is removed.
• Many (say 10) hash functions: too many collisions.
23
A A B
C B C
A A
B B
C C
How many hash functions?
• 1 hash function produces many pure cells initially but nothing to undo when an element is removed.
• Many (say 10) hash functions: too many collisions.• We find by experiment that 3 or 4 hash functions
works well. Is there some theoretical reason?
24
A A B
C C
A
B
B
C
Theory
• Let d = difference size, k = # hash functions.• Theorem 1: With (k + 1) d cells, failure probability
falls exponentially with k. – For k = 3, implies a 4x tax on storage, a bit weak.
• [Goodrich,Mitzenmacher]: Failure is equivalent to finding a 2-core (loop) in a random hypergraph
• Theorem 2: With ck d, cells, failure probability falls exponentially with k.
– c4 = 1.3x tax, agrees with experiments
25
26
Large Differences:1.25x - 1.4x
Recall experimentsSp
ace
Ove
rhea
d
Set Difference
Hash Cnt 3Hash Cnt 4
Overhead to decode at >99%
Connection to Coding
• Mystery: IBF decode similar to peeling procedure used to decode Tornado codes. Why?
• Explanation: Set Difference is equivalent to coding with insert-delete channels
• Intuition: Given a code for set A, send checkwords only to B. Think of B as a corrupted form of A.
• Reduction: If code can correct D insertions/deletions, then B can recover A and the set difference.
27
Reed Solomon <---> Polynomial Methods LDPC (Tornado) <---> Difference Digest
28
Random Subsets Fast Elimination
Sparse
X + Y + Z = . . αd
X = . .
Y = . .Pure
Roughly upper triangular and sparse
29
Difference Digests
• Consists of two data structures:– Invertible Bloom Filter (IBF)• Efficiently computes the set difference.• Needs the size of the difference
– Strata Estimator• Approximates the size of the set difference.• Uses IBF’s as a building block.
30
Strata EstimatorA
ConsistentPartitioning
B C
~1/2
~1/4
~1/8
1/16
IBF 1
IBF 4
IBF 3
IBF 2
Estimator
• Divide keys into sampled subsets containing ~1/2k
• Encode each subset into an IBF of small fixed size– log(n) IBF’s of ~20 cells each
31
4x
Strata Estimator
IBF 1
IBF 4
IBF 3
IBF 2
Estimator 1
• Attempt to subtract & decode IBF’s at each level.• If level k decodes, then return:
2k x (the number of ID’s recovered)
…
IBF 1
IBF 4
IBF 3
IBF 2
Estimator 2…Decode
Host 1 Host 2
32
KeyDiff Service
• Promising Applications:– File Synchronization– P2P file sharing– Failure Recovery
Key Service
Key Service
Key Service
Application Application
Application
Add( key )Remove( key )Diff( host1, host2 )
33
Difference Digest Summary
• Strata Estimator– Estimates Set Difference.– For 100K sets, 15KB estimator has <15% error– O(log n) communication, O(n) computation.
• Invertible Bloom Filter– Identifies all ID’s in the Set Difference.– 16 to 28 Bytes per ID in Set Difference.– O(d) communication, O(n+d) computation– Worth it if set difference is < 20% of set sizes
34
Connection to Sparse Recovery?
• If we forget about subtraction, in the end we are recovering a d-sparse vector.
• Note that the hash check is key for figuring out which cells are pure after differencing.
• Is there a connection to compressed sensing. Could sensors do the random summing? The hash summing?
• Connection the other way: could use compressed sensing for differences?
35
Comparison with Information Theory and Coding
• Worst case complexity versus average• It emphasize communication complexity not
computation complexity: we focus on both.• Existence versus Constructive: some similar
settings (Slepian-Wolf) are existential• Estimators: We want bounds based on
difference and so start by efficiently estimating difference.
36
Aside: IBFs in Digital Hardware
a , b, x, yStream of set elements
Logic (Read, hash, Write)
Bank 1 Bank 2 Bank 3
Hash 1 Hash 2 Hash 3
Hash to separate banks for parallelism, slight cost in space needed. Decode in software
Strata Hash
37
Part 2: Towards a theory of Cloud Complexity
?O1
O3
O2
Complexity of reconciling “similar” objects?
38
Example: Synching Files
?
Measures: Communication bits, computation
X.ppt.v3
X.ppt.v2
X.ppt.v1
39
So far: Two sets, one link, set difference
{a,b,c} {d,a,c}
40
Mild Sensitivity Analysis: One set much larger than other
?Set A Set B
Small difference d
(|A|) bits needed, not O (d) : Patrascu 2008Simpler proof: DKS 2011
41
Asymmetric set difference inLBFS File System (Mazieres)
?File A
Chunk Set B at Server
1 chunk difference
LBFS sends all chunk hashes in File A: O|A|
C1 C2 C3
C97 C98 C99
C1 C5 C3
C97 C98 C99
. . .. . .
File B
42
More Sensitivity Analysis: small intersection: database joins
?Set A
Set B
Small intersection d
(|A|) bits needed, not O (d) : Follows from results on hardness of set disjointness
43
Sequences under Edit Distance(Files for example)
?
File A File B
Edit distance 2
Insert/delete can renumber all file blocks . . .
A BC D E F
A CD
E F G
44
Sequence reconciliation (with J. Ullman)
File A File B
Edit distance 1
Send 2d+1 piece hashes. Clump unmatched pieces and recurse. O( d log (N) )
A BC D E F
A CD
E F
H1
H2
H3
H2
H3
2
45
21 years of Sequence Reconciliation!
• Schwartz, Bowdidge, Burkhard (1990): recurse on unmatched pieces, not aggregate.
• Rsync: widely used tool that breaks file into roughly piece hashes, N is file length.
UCSD, Lunch Princeton, kids
N
46
Sets on graphs?
{a,b,c} {d,c,e}
{b,c,d}
{a,f,g}
47
Generalizes rumor spreading which has disjoint singleton sets
{a} {d}
{b}
{g}
CLP10,G11,: O( E n log n /conductance)
48
Generalized Push-Pull (with N. Goyal and R. Kannan)
{a,b,c} {d,c,e}
{b,c,d}Pick random edge
Do 2 party set reconciliation
Complexity: C + D, C as before, D = Sum (U – S ) i i
49
Sets on Steiner graphs?
{a} U S {b} U S
R1
Only terminals need sets. Push-pull wasteful!
50
Butterfly example for Sets
S2
S1
S1
D = Diff(S1 ,S2)
S2
D D
Set difference instead of XOR within network
S1
X
Y
51
How does reconciliation on Steiner graphs relate to network coding?
• Objects in general, not just bits.• Routers do not need objects but can
transform/code objects.• What transformations within network allow
efficient communication close to lower bound?
52
Sequences with d mutations:VM code pages (with Ramjee et al)
?
VM A VM B
2 “errors”
Reconcile Set A = {(A,1)(B,2),(C,3),(D,4),(E,5)} and Set B = {(A,1),(X,2),(C,3),(D,4),(Y,5)}
A BC D E
A XC D Y
53
Twist: IBFs for error correction?(with M. Mitzenmacher)
• Write message M[1..n] of n words as set S = {(M[1],1), (M[2], 2), . . (M[n], n)}.
• Calculate IBF(S) and transmit M, IBF(S) • Receiver uses received message M’ to find
IBF(S’); subtracts from IBF’(S) to locate errors.• Protect IBF using Reed-Solomon or redundancy• Why: Potentially O(e) decoding for e errors --
Raptor codes achieve this for erasure channels.
54
The Cloud Complexity Milieu2 Node Graph Steiner
Nodes
Sets (Key,values) EGUV11 GKV11 ?
Sequence, Edit Distance (Files)
SBB90 ? ?
Sequence,errors only (VMs)
MV11 ? ?
Sets of sets (database tables)
? ? ?
Streams (movies) ? ? ?
Other dimensions: approximate, secure, . . .
Conclusions: Got Diffs?
• Resiliency and fast recoding of random sums set reconciliation; and error correction?
• Sets on graphs– All terminals: generalizes rumor spreading – Routers,terminals: resemblance to network coding.
• Cloud complexity: Some points covered, many remain• Practical, may be useful to synch devices across cloud.
55
56
Comparison to Logs/Incremental Updates
• IBF work with no prior context.• Logs work with prior context, BUT– Redundant information when sync’ing with
multiple parties.– Logging must be built into system for each write.– Logging adds overhead at runtime.– Logging requires non-volatile storage.• Often not present in network devices.
IBF’s may out-perform logs when:• Synchronizing multiple parties• Synchronizations happen infrequently