history-independent cuckoo hashing

19
History-Independent Cuckoo Hashing Weizmann Institute Israel Udi Wieder Moni Naor Gil Segev Microsoft Research Silicon Valley

Upload: mona-joyner

Post on 03-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

History-Independent Cuckoo Hashing. Gil Segev. Moni Naor. Udi Wieder. Weizmann Institute Israel. Microsoft Research Silicon Valley. Election Day. Carol. Alice. Alice. Bob. Elections for class president Each student whispers in Mr. Drew’s ear Mr. Drew writes down the votes. Carol. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: History-Independent Cuckoo Hashing

History-IndependentCuckoo Hashing

Weizmann InstituteIsrael

Udi WiederMoni Naor Gil Segev

Microsoft Research Silicon Valley

Page 2: History-Independent Cuckoo Hashing

2

Election DayCarol

Bob

Carol

Elections for class president Each student whispers in Mr. Drew’s ear Mr. Drew writes down the votes

Alice Alice Bob

Alice Problem:

Mr. Drew’s notebook leaks sensitive information First student voted for Carol Second student voted for Alice …

AliceMay compromise the

privacy of the elections

Page 3: History-Independent Cuckoo Hashing

3

Election Day

Carol

AliceBob

11

1

1

Carol Alice Alice Bob What about more involved applications?

Write-in candidates Votes which are subsets or rankings ….

A simple solution: Lexicographically sorted list of

candidates Unary counters

Page 4: History-Independent Cuckoo Hashing

4

Learning From History

A simple example: sorted list Canonical memory representation Not really efficient...

The two levels of a data structure “Legitimate” interface Memory representation

History independence The memory representation should not reveal information that cannot be obtained using the legitimate interface

AliceBob

Carol

Page 5: History-Independent Cuckoo Hashing

5

Typical Applications

Incremental cryptography [BGG94, Mic97]

Voting [MKSW06, MNS07]

Set comparison & reconciliation [MNS08]

Computational geometry [BGV08]

...

Page 6: History-Independent Cuckoo Hashing

6

Our ContributionA HI dictionary that simultaneously achieves the following:

Efficiency: Lookup time – O(1) worst case Update time – O(1) expected amortized Memory utilization 50% (25% with deletions)

Strongest notion of history independence

Simple and fast

Page 7: History-Independent Cuckoo Hashing

8

Notions of History Independence

Weak history independence Memory revealed at the end of an activity period Any two sequences of operations S1 and S2 that lead to the same

content induce the same distribution on the memory representation

Strong history independence Memory revealed several times during an activity period Any two sets of breakpoints along S1 and S2 with the same content

at each breakpoint, induce the same distributions on the memory representation at all these points

Completely randomizing memory after each operation is not good enough

Naor and Teague (2001) following Macciancio (1997)

Page 8: History-Independent Cuckoo Hashing

9

Notions of History Independence

Weak & strong are not equivalent WHI for reversible data structures is possible without a canonical

representation Provable efficiency gaps [BP06] (in restricted models)

We consider strong history independence Canonical representation (up to initial randomness) implies SHI Other direction shown to hold for reversible data structures

[HHMPR05]

Page 9: History-Independent Cuckoo Hashing

10

SHI DictionariesDeletions

Memory utilization

Update time

Lookup time

Practical?

Naor & Teague ‘01

Blelloch & Golovin ‘07

Blelloch & Golovin ‘07

This work

99%

99%

< 9%

< 25%(< 50%)

O(1) expected

O(1) expected

O(1) expected

O(1) expected

O(1) worst case

O(1) expected

O(1) worst case

O(1) worst case

?

(mem. util. < 50%)

(mem. util. < 50%)

Page 10: History-Independent Cuckoo Hashing

11

Our Approach Cuckoo hashing [PR01]:

A simple & practical scheme with worst case constant lookup time

Force a canonical representation on cuckoo hashing No significant loss in efficiency

Avoid rehashing by using a small stash What happens when hash functions fail? Rehashing is problematic in SHI data structures

All hash functions need to be sampled in advance (theoretical problem) When an item is deleted, may need to roll back on previous functions

We use a secondary storage to reduces the failure probability exponentially [KMW08]

Page 11: History-Independent Cuckoo Hashing

12

Cuckoo Hashing Tables T1 and T2 with hash functions h1 and h2

Store x in one of T1[h1(x)] and T2[h2(x)]

Insert(x): Greedily insert in T1 or T2

If both are occupied then store x in T1

Repeat in other table with the previous occupant

Y

Z

V

T1 T2

X

Z

Y

V

T1 T2

X

Successful insertion

W W

Page 12: History-Independent Cuckoo Hashing

13

Cuckoo Hashing Tables T1 and T2 with hash functions h1 and h2

Store x in one of T1[h1(x)] and T2[h2(x)]

Y

U

Z

V

T1 T2

X

Failure –rehash

required

Insert(x): Greedily insert in T1 or T2

If both are occupied then store x in T1

Repeat in other table with the previous occupant

Page 13: History-Independent Cuckoo Hashing

14

The Cuckoo Graph Set S ½ U containing n keys h1, h2 : U ! {1,...,r}

Bipartite graph with sets of size rEdge (h1(x), h2(x)) for every x2S

S is successfully stored

Every connected componenthas at most one cycle

Main theorem:

If r ¸ (1 + ²)n and h1,h2 are log(n)-wise independent,then failure probability is £(1/n)

Page 14: History-Independent Cuckoo Hashing

15

The Canonical Representation Assume that S can be stored using h1 and h2 We force a canonical representation on the cuckoo graph

Suffices to consider a single connected component

Assume that S forms a tree in the cuckoo graph. Typical case

One location must be empty. The choice of the empty location uniquely determines the location of all elements

a

b

d

c

eRule: h1 (minimal element) is empty

Page 15: History-Independent Cuckoo Hashing

16

The Canonical Representation Assume that S can be stored using h1 and h2 We force a canonical representation on the cuckoo graph

Suffices to consider a single connected component

Assume that S has one cycle Two ways to assign elements in the

cycle Each choice uniquely determines the

location of all elements

a

b

d

c

eRule: minimal element in cycle lies in T1

Page 16: History-Independent Cuckoo Hashing

17

The Canonical Representation Updates efficiently maintain the canonical representation Insertions:

New leaf: check if new element is smaller than current min new cycle:

Same component… Merging two components…

All cases straight forward

Update time < size of component = expected (small) constant

Deletions: Find the new min, split component,… Requires connecting all elements in the component with a sorted cyclic list

Memory utilization drops to 25% All cases straight forward

Page 17: History-Independent Cuckoo Hashing

18

Rehashing What if S cannot be stored using h1 and h2 ?

Happens with probability £(1/n)

Can we simply pick new functions? Rear, but very bad worst case performance Canonical memory implies we need to sample all hash functions

in advance (theoretical problem) Whenever an item is deleted, need to check whether we must

role back to previous hash functions A bad item which is repeatedly inserted and deleted would cause

a rehash every operation!

Page 18: History-Independent Cuckoo Hashing

19

Using a Stash Whenever an insert fails, put a ‘bad’ item in a secondary

data structure Bad item: smallest item that belongs to a cycle Secondary data structure must be SHI in itself

Theorem [KMW08]: Pr[|stash| > s] < n-s

In practice keeping the stash as a sorted list is probably the best solution Effectively the query time is constant with (very) high probability

In theory the stash could be any SHI with constant lookup time A deterministic hashing scheme, where the elements are rehashed

whenever the content changes [AN96, HMP01]

Page 19: History-Independent Cuckoo Hashing

20

Conclusions and Problems Cuckoo hashing is a robust and flexible hashing scheme

Easily ‘molded’ into a history independent data structure

We don’t know how to do this for CH with more than 2 hash functions and/or more than 1 element per bucket Better memory utilization, better performance, but.. Expected size of connected component is not constant

Full performance analysis