shades: expediting kademlia’s lookup process gil einziger, roy friedman, yoav kantor computer...

Shades: Expediting Kademlia’s Lookup

Process

Shades: Expediting Kademlia’s Lookup

ProcessGil Einziger, Roy Friedman, Yoav Kantor

Computer Science, Technion

1

Kademlia Overview

Kademlia is nowadays implemented in many popular file sharing applications like Bit-torrent, Gnutella, and eMule.

Applications over Kademlia have 100’s of millions users worldwide.

Invented in 2002 by Petar Maymounkov and David Mazieres.

2

Kademlia is goodKademlia has a number of desirable features not simultaneously offered by any previous DHT.

– It minimizes the number of configuration messages nodes must send to learn about each other.

– Configuration information spreads automatically as a side-effect of key lookup.

– Nodes have enough knowledge and flexibility to route queries through low-latency paths.

– Kademlia uses parallel, asynchronous queries to avoid timeout delays from failed nodes.

Easy to maintain

Easy to maintain

Fast Log(N) lookups

Fault tolerant

3

Many ways to reach the same value…

K possible peers to make the first step.

The first peer returns k other peers that are closer to the value. Each one of these peers returns other closer peers

And so on…

4

Until finally we reach the k-closest nodes.

These nodes store the actual value!

Many possible routing paths…

All roads lead to Rome…

But all of them lead to the same k closest peers.

Popular contentMany users that love Fry… Please wait …

we’re all laptops here.

5

6

OneHop

Kelips

Chord

Late

nc

y

Kademlia

Overheads

Shades

Low latency DHTS typically require gossip to maintain a large state.

Other DHTS are easier to maintain, but encounter longer routing.

The big pictureOur Goal: Reduce the latency and remain very easy to maintain.

Caching to the rescue!

Local Cache – After searching an item, cache it locally. (Guangmin, 2009). KadCache – After searching an item, send it to the last peer along the path.

Kaleidoscope – Break symmetry using colors.Designed to reduce message cost, and not latency.

KC

LC

Motivation: If a value is popular, we should be able to hit a cached copy before reaching the k-closest nodes.

7

Fre

qu

en

cy

Rank

Caching Internet Content• The access distribution of most content is skewed▫Often modeled using Zipf-like functions, power-

law, etc.

Long Heavy Tail For example~(50% of the

weight)

A small number of very popular items

For example~(50% of the weight)

Fre

qu

en

cy

Rank

Caching Internet Content

• Unpopular items can suddenly become popular and vice versa.

Blackmail is such an ugly word. I prefer "extortion".

The "X" makes it sound cool.

Shades overview• Form a large distributed cache

from many nodes. – Make sure these caches are

accessible early during the lookup.• Single cache behavior –

– Admission policy– Eviction policy.

10

Palette

11

The Palette provides a mapping from colors to nodes of that color.

We want to have at least a single node, from every color.

K- buckets Palette

Shades in Brief• Do the original Kademlia lookup

and in the same time, contact correctly colored nodes from the palette.

12Original routing advance us towards the value.

Correctly colored nodes– are likely to contain a cached copy of the value.

Multiple cache lookups

13

Problem: If the first routing step is not successful, how can we get additional correctly colored nodes ?

Solution: Use the palette of contacted nodes!

Looking for “bender” a key.

Response+ node.

Cache Victim

Winner

Eviction and Admission Policies

Eviction Policy(Lazy LFU)

Admission Policy

TinyLFU

New Item

One of you guys should

leave …

is the new item any

better than the victim ?

What is the common Answer ?

1 3 1 1

TinyLFU: LFU Admission Policy

22

Keep inserting new items to the histogram until #items = W

4

7#items 8910

Once #items reaches W - divide

all counters by 2 .

1 2 1

5

TinyLFU Example

1 3 122 41 1

New Item

Cache Victim

Winner

Admission Policy

TinyLFU

Victim Score: 3New Item Score: 2

Victim Wins !

Eviction Policy(Lazy LFU)

23

What are we doing?

Past

Approximate

Future

It is much cheaper to maintain an approximate view of the past.

•Estimate(item):▫Return BF.contains(item) +MI-CBF.estimate(item)

•Add(item):▫W++▫If(W == WindowSize)

Reset()▫If(BF.contains(item))

Return MI-CBF.add(item)BF.add(item)

TinyLFU operationBloom Filter

MI-CBF

Reset

• Divide W by 2, • erase Bloom filter, • divide all counters by 2.

(in MI-CBF).

Eviction Policy: Lazy LFU

19

Motivation: Efficient approximation of the LFU eviction policy, in case that admission is rare.“Search for the least frequently used item… in a lazy manner”

A 7

B 6

C 8

D 5

E 2

F 17

G 31

Victim

Search item

Get Victim 1

Victim

Search item

Get Victim 2

VictimVictim

Search itemGet Victim 3

VictimVictim

Search item

Shades Tradeoff• What happens as the

number of colors increases?

22

We form larger distributed caches.

But it is more difficult to fill the palette.

Comparative results • Emulation – We run the actual implementation,

sending and receiving actual UDP packets. (Only the user is simulated)

• Scale - Different network sizes up to 5,000 Kademlia peers.

• Experimental settings: Each peer does: • 500 requests warm-up.• 500 requests measurement interval.

(Up to 2.5 Million find value requests in warm-up and 2.5 Million requests in measurement)

• Experiment generation: Each peer receives a file with 1000 requests from the appropriate workload. All users continuously play the requests.

23

Wikipedia trace (Baaren & Pierre 2009)

“10% of all user requests issued to Wikipedia during the period from

September 19th 2007 to October 31th. “

24

YouTube trace (Cheng et al, QOS 2008)

Weekly measurement of ~160k newly created videos during a period of 21

weeks.

• We directly created a synthetic distribution for each week.

Comparative results

25

YouTube workload 100 items cache.More queries are

finished sooner.

Other caching strategies offer only a marginal reduction of the number of contacted nodes.

This is the ideal corrner we want to complete as many of the lookups as soon as possible!

Comparative results

26

YouTube workload unbounded cache.Shades is also better

for unbounded cache!

Notice that Shades_100 is still better than other caches with unbounded cache.

Comparative results

27

Wikipedia workload - 100 items cache.

Comparative results

28

Load is more balanced because frequent items are found in a single

step.

Similar message overheads to other suggestions.

ConclusionsLatency improvement– up to 22-34% reduction of median latency and 18-23% reduction of average latency.

Better load distribution – Busiest nodes are 22-43% less congested – cached values are not close to the stored values.

Reproducibility– Shades is an open source project : https:// code.google.com/p/shades/Kaleidoscope, KadCache and Local are released as part of the open source project OpenKad: https://code.google.com/p/openkad/.

Feel free to use them!

29

https://code.google.com/p/openkad/



The end:

Any questions ?

Thanks for listening!

30

shades: expediting kademlia’s lookup process gil einziger, roy friedman, yoav kantor computer...

Documents

colored nodes

closest nodes

failed nodes

palette of contacted

original kademlia lookup

closest peers

fast fast shades

local cache