principles of reliable distributed systems lecture 2: distributed hash tables (dht), chord
DESCRIPTION
Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord. Spring 2008 Idit Keidar. Today’s Material. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Stoica et al. Reminder: Peer-to-Peer Lookup. Insert (key, file) Lookup (key) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/1.jpg)
1Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Principles of Reliable Distributed Systems
Lecture 2: Distributed Hash
Tables (DHT), Chord
Spring 2008 Idit Keidar
![Page 2: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/2.jpg)
2Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Today’s Material
• Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications– Stoica et al.
![Page 3: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/3.jpg)
3Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Reminder: Peer-to-Peer Lookup
• Insert (key, file)• Lookup (key)
– Should find keys inserted in any node
![Page 4: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/4.jpg)
4Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Reminder: Overlay Networks
• A virtual structure imposed over the physical network (e.g., the Internet)– over the Internet, there is a
(IP level) link between every pair of nodes
– an overlay uses a fixed subset of these
• Why restrict to a subset?
![Page 5: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/5.jpg)
5Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Routing/Lookup in Overlays
• How does one route a packet to its destination in an overlay?
• How about lookup (key)?• Unstructured overlay: (last week)
– Flooding or random walks• Structured overlay: (today)
– The links are chosen according to some rule– Tables define next-hop for routing and lookup
![Page 6: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/6.jpg)
6Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Structured Lookup Overlays• Many academic systems –
– CAN, Chord , D2B, Kademlia, Koorde, Pastry, Tapestry, Viceroy, …
• OverNet based on the Kademlia algorithm• Symmetric, no hierarchy• Decentralized self management• Structured overlay – data stored in a defined place,
search goes on a defined path• Implement Distributed Hash Table (DHT)
abstraction
![Page 7: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/7.jpg)
7Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Reminder: Hashing
• Data structure supporting the operations:– void insert( key, item ) – item search( key )
• Implementation uses hash function for mapping keys to array cells
• Expected search time O(1)– provided that there are few collisions
![Page 8: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/8.jpg)
8Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Distributed Hash Tables (DHTs)
• Nodes store table entries– The role of array cells
• Good abstraction for lookup? – Why?
![Page 9: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/9.jpg)
9Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
The DHT Service Interface
lookup( key ) returns the location of the node currently
responsible for this keykey is usually numeric (in some range)
![Page 10: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/10.jpg)
10Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Using the DHT Interface
• How do you publish a file?• How do you find a file?• Requirements for an application being able
to use DHTs?– Data identified with unique keys– Nodes can (agree to) store keys for each other
• location of object (pointer) or actual object (data)
![Page 11: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/11.jpg)
11Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
What Does a DHT Implementation Need to Do?
• Map keys to nodes– Needs to be dynamic as nodes join and leave– How does this affect the service interface?
• Route a request to the appropriate node– Routing on the overlay
![Page 12: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/12.jpg)
12Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Lookup Example
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
insert(K1,V1)
K V(K1,V1)
lookup(K1)
![Page 13: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/13.jpg)
13Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Mapping Keys to Nodes
• Goal: load balancing– Why?
• Typical approach: – Give an m-bit id to each node and each key
(e.g., using SHA-1 on the key, IP address)– Map key to node whose id is “close” to the key
(need distance function) – How is load balancing achieved?
![Page 14: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/14.jpg)
14Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Routing Issues
• Each node must be able to forward each lookup query to a node closer to the destination
• Maintain routing tables adaptively– Each node knows some other nodes– Must adapt to changes (joins, leaves, failures)– Goals?
![Page 15: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/15.jpg)
15Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Handling Join/Leave
• When a node joins it needs to assume responsibility for some keys – Ask the application to move these keys to it– How many keys will need to be moved?
• When a nodes fails or leaves, its keys have to be moved to others– What else is needed in order to implement this?
![Page 16: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/16.jpg)
16Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
P2P System Interface
• Lookup• Join• Move keys
![Page 17: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/17.jpg)
17Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Chord
Stoica, Morris, Karger, Kaashoek, and Balakrishnan
![Page 18: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/18.jpg)
18Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Chord Logical Structure
• m-bit ID space (2m IDs), usually m=160.• Think of nodes as organized in a logical ring
according to their IDs.N1
N8
N10
N14
N21
N30N38
N42
N48
N51N56
![Page 19: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/19.jpg)
19Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Consistent Hashing: Assigning Keys to Nodes
• Key k is assigned to first node whose ID equals or follows k – successor(k)
N1N8
N10
N14
N21
N30N38
N42
N48
N51N56
K54
![Page 20: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/20.jpg)
20Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Moving Keys upon Join/Leave
• When a node joins, it becomes responsible for some keys previously assigned to its successor – Local change– Assuming load is balanced, how many keys
should move?• And what happens when a node leaves?
![Page 21: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/21.jpg)
21Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Consistent Hashing Guarantees• For any set of N nodes and K keys, w.h.p.:
– Each node is responsible for at most (1 + )K/N keys– When an (N + 1)st node joins or leaves,
responsibility for O(K/N) keys changes hands (only to or from the joining or leaving node)
• For the scheme described above, = O(logN) can be reduced to an arbitrarily small constant
by having each node run (logN) virtual nodes, each with its own identifier
![Page 22: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/22.jpg)
22Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Simple Routing Solutions
• Each node knows only its successor – Routing around the circle– Good idea?
• Each node knows all other nodes– O(1) routing– Cost?
![Page 23: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/23.jpg)
23Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Chord Skiplist Routing• Each node has “fingers” to nodes ½ way around the ID
space from it, ¼ the way…• finger[i] at n contains successor(n+2i-1)• successor is finger[1]
N0N8
N10
N14
N21
N30N38
N42
N48
N51N56
How many entries in the finger table?
![Page 24: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/24.jpg)
24Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Example: Chord FingersN0
N10
N21
N30
N47
finger[1..4]
N72
N82
N90
N114
finger[5]
finger[6]
finge
r[7]
m entrieslog N distinct fingers with high probability
![Page 25: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/25.jpg)
25Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Chord Data Structures (At Each Node)
• Finger table• First finger is successor• Predecessor
![Page 26: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/26.jpg)
26Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Forwarding Queries
• Query for key k is forwarded to finger with highest ID not exceeding k
K54 Lookup( K54 )N0
N8N10
N14
N21
N30N38
N42
N48
N51N56
![Page 27: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/27.jpg)
27Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
How long does it take?
Remote Procedure Call (RPC)
![Page 28: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/28.jpg)
28Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Routing Time• Node n looks up a key stored at node p• p is in n’s ith interval:
p ((n+2i-1)mod 2m, (n+2i)mod 2m] • n contacts f=finger[i]
– The interval is not empty (because p is in it) so: f ((n+2i-1)mod 2m, (n+2i)mod 2m]
– RPC f• f is at least 2i-1 away from n• p is at most 2i-1 away from f• The distance is halved: maximum m steps
![Page 29: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/29.jpg)
29Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Routing Time Refined
• Assuming uniform node distribution around the circle, the number of nodes in the search space is halved at each step: – Expected number of steps: log N
• Note that:– m = 160 – For 1,000,000 nodes, log N = 20
![Page 30: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/30.jpg)
30Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
What About Network Distance?K54
Lookup( K54 )N0N8
N10
N14
N21
N30N38
N42
N48
N51N56
Haifa
Texas
China
![Page 31: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/31.jpg)
31Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Joining Chord
• Goals?• Required steps:
– Find your successor– Initialize finger table and predecessor– Notify other nodes that need to change their
finger table and predecessor pointer• O(log2N)
– Learn the keys that you are responsible for; notify others that you assume control over them
![Page 32: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/32.jpg)
32Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Join Algorithm: Take II
• Observation: for correctness, successors suffice – Fingers only needed for performance
• Upon join, update successor only• Periodically,
– Check that successors and predecessors are consistent
– Fix fingers
![Page 33: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/33.jpg)
33Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Creation and Join
![Page 34: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/34.jpg)
34Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
![Page 35: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/35.jpg)
35Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Join Examplejoiner finds successor
getskeys
stabilizefixes
successor
stabilizefixes
predecessor
![Page 36: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/36.jpg)
36Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Join Stabilization Guarantee
• If any sequence of join operations is executed interleaved with stabilizations,– Then at some time after the last join – The successor pointers form a cycle on all the
nodes in the network• Model assumptions?
![Page 37: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/37.jpg)
37Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Performance with Concurrent Joins
• Assume a stable network with N nodes with correct finger pointers
• Now, another set of up to N nodes joins the network, – And all successor pointers (but perhaps not all
finger pointers) are correct, • Then lookups still take O(logN) time w.h.p.• Model assumptions?
![Page 38: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/38.jpg)
38Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Failure Handling
• Periodically fixing fingers • List of r successors instead of one successor• Periodically probing predecessors:
![Page 39: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/39.jpg)
39Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Failure Detection
• Each node has a local failure detector module• Uses periodic probes and timeouts to check
liveness of successors and fingers– If the probed node does not respond by a designated
timeout, it is suspected to be faulty• A node that suspects its successor (finger) finds a
new successor (finger)• False suspicion - the suspected node is not faulty
– Suspected due to communication problems
![Page 40: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/40.jpg)
40Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
The Model?• Reliable messages among correct nodes
– No network partitions• Node failures can be accurately detected!
– No false suspicions• Properties hold as long as failure is bounded:
– Assume a list of r = (logN) successors– Start from stable state and then each node fails with prob. 1/2– Then w.h.p. find successor returns the closest living successor to
the query key– And the expected time to execute find successor is O(logN)
![Page 41: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/41.jpg)
41Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
What Can Partitions Do?
N0N8
N10
N14
N21N38
N42
N51N56
Suspect successor
N30Suspect
successor
N48
Suspect successor
![Page 42: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/42.jpg)
42Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
What About Moving Keys?
• Left up to the application• Solution: keep soft state, refreshed
periodically– Every refresh operation performs lookup(key)
before storing the key in the right place• How can we increase reliability for the time
between failure and refresh?
![Page 43: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/43.jpg)
43Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
Summary: DHT Advantages
• Peer-to-peer: no centralized control or infrastructure
• Scalability: O(log N) routing, routing tables, join time
• Load-balancing• Overlay robustness
![Page 44: Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681633b550346895dd3cb0b/html5/thumbnails/44.jpg)
44Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008
DHT Disadvantages
• No control where data is stored• In practice, organizations want:
– Content Locality – explicitly place data where we want (inside the organization)
– Path Locality – guarantee that local traffic (a user in the organization looks for a file of the organization) remains local
• No prefix search