p2p systems
TRANSCRIPT
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Distributed Data ManagementPart 3 - Peer-2-Peer Systems (cont)
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Overview
1.P2P Systems and Resource Location2.Unstructured P2P Overlay Networks3.Hierarchical P2P Overlay Networks4.Structured P2P Overlay Networks5.Small World Graphs6.P2P Data Management
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
4. Structured P2P Overlay Networks• Unstructured overlay networks – what we
learned– simplicity (simple protocol) – robustness (almost impossible to “kill” – no central
authority)
• Performance– search latency O(log n), n number of peers– update and maintenance cost low
©2014, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Structured P2P Overlay Networks
• Drawbacks– high bandwidth consumption for search – free riding
• Can we do better?
©2014, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Efficient Resource Location
search cost
maximal bandwidth
update cost
low low
low
high
high
high
UNSTRUCTURED P2POVERLAY NETWORKS(e.g. Gnutella)
SERVER, SUPERPEERS(e.g. Napster)
FULL REPLICATION
STRUCTURED P2P OVERLAYNETWORKS(e.g. prefix routing)
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Structured P2P Overlay Networks• Goal: efficient search using few messages
without designated servers• Easy: distribution of index information over all
peers– every peer maintains and provides part of the
index information (k, p)• Difficult: distributing the access structure to
support efficient search– Realized by an “overlay network”
©2014, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Structured P2P Overlay Networks
• Problem Illustration:
index information I
server
accessstructure
peers (storing resources)
peers (storing resources and index information)
I1 I2 I3 I4
?Search starts here
Where to start the search?How to locate the index information?
Overlay network = Access structure
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Example: Scalable Distributed Tries (P-Grid)
• Search trie: search keys are binary keys
000 001 010 011 100 101 110 111
00? 01? 10? 11?
0?? 1??
???
access structure
101?
101?
101?
101!
indexitems
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Non-scalable Distribution of Search Tree
• Distribute search tree over peers
000 001 010 011 100 101 110 111
00? 01? 10? 11?
0?? 1??
???
peer 1 peer 2 peer 3 peer 4
bottleneck
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
"Napster"bottleneck
Scalable Distribution of Search Tree
000 001 010 011 100 101 110 111
00? 01? 10? 11?
0?? 1??
???
peer 1 peer 2 peer 3 peer 4
Associate each peer with a complete path
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Routing Information
100 101
10?
1??
???
peer 1 peer 2
peer 3
peer 4
know more about this part of the tree
knows more about this part of the tree
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Prefix Routing
11?
1??
???
peer 4
peer 1peer 2
peer 3
110 111
100 101
10?
1??
???
peer 1peer 2
c3
peer 4
101?
101?
101?
101?
101!
Messageto peer 3
101?
prefix peer
0?? peer1 peer2
10? peer3
routing tableof peer 4
P-Grid Routing Tables and Search
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
search(p, k)if k=path(p) then return(p) //found else find in routing table peeri with longest prefix matching k search(peeri, k)
Peer with path p = p1,..,pl ,pi {0,1}, stores routing table:• For prefix p1,..,pj, j=1,..,l a constant
number r of references to peers with path p1,..,1-pj
• A constant number r of references to replicas with the same path
0 1
01 00
Example: routing table of apeer with path 01101
011 010
0110 0111
01101 01100
P1: 100P2: 1100
P3: 00110P4: 0000
P5: 01011P6: 0100
P7: 01110P8: 01111
P9: 01100P10: 01100
P11: 01101P12: 01101
Search cost bound by routing table size: log2(n) for balanced tree
replicas
Questions• The index information in a structured overlay network
1. Provides references to route a search request within the overlay network2. Provides for a given key the reference to the peer that stores the resource3. Is replicated in routing tables to support redundant search paths
• For the given routing table, the search request for the key 0101 is routed
1. Always to peer P52. Either to peer P5 or P63. Either to peer P3, P4, P5 or P6
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
0 1
01 00
011 010
0110 0111
01101 01100
P1: 100P2: 1100
P3: 00110P4: 0000
P5: 01011P6: 0100
P7: 01110P8: 01111
P9: 01100P10: 01100
P11: 01101P12: 01101replicas
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Structured P2P Overlay Network Approaches
• Different strategies– P-Grid: distributing a binary search tree– Chord: constructing a distributed hash table– CAN: Routing in a d-dimensional space– FreeNet: caching index information along search
paths
©2014, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Structured P2P Overlay Network Approaches
• Commonalities– each peer maintains a small part of the index
information – each peer maintains a small routing table for
routing in the overlay network– searches are performed by directed message
forwarding• Differences– performance and qualitative criteria
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Example 2: Distributed Hash Tables (Chord)
• Hashing of search keys AND peer addresses on binary keys of length m– e.g. m=5, key("jingle-bells.mp3")=4, key(196.178.0.1)=19
• Data keys are stored at peer with next larger peer keypeer with hashed identifier p, data with hashed identifier k, if k ] predecessor(p), p ]then k stored at p
m=532 keys
p1 = predecessor(k)
p2 = successor(k)p3
k
storedat
Search strategies1. every peer knows all others
O(n) routing table size2. peers know successor only
O(n) search cost
0 1 2
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Chord Routing Tables
• Idea: every peer knows m peers at exponentially increasing distancePeer p stores its successor(p) and the
first peer with hashed identifier si such that si =successor(p+2i-1) for i=1,..,mWe write also si = finger(i, p)
p p+2p+4
p+1
p+8
p+16
s1, s2, s3
s4
s5
p2
p3p4
i si
1 p2
2 p2
3 p2
4 p3
5 p4
Routing table size: m
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Search in Chordsearch(p, k)find in routing table largest (i, p*) such that p* [p,k[if such a p* exists then search(p*, k)else return (successor(p)) // found
p p+2p+4
p+1
p+8
p+16
s1, s2, s3
s4s5
k1k2Expected search cost: O(log n)
p2
p3p4
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Length of Search Paths
Network size n=2^12
100 2^12 keys
Path length ½ log2(n)
Maintenance of Chord• Maintain the integrity of routing tables if peers join or leave• Example Chord: New node q joining the network
– Successor nodes need to be updated: successor(p) = q, successor(q) = p2– Finger tables need to be updated: both at new and existing peers
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
p p+2p+4
p+1
p+8
p+16
p2
p3p4
q
i si
1 q
2 q
3 p2
4 p3
5 p4
i si
1 p2
2 p2
3 p3
4 p3
5 p4
routing tableof p
routing tableof q
Expected cost: O(log^2 n)
Question• When routing in Chord
1. The next hop is always uniquely determined2. The next hop can be chosen among a constant
number of possible candidates3. The next hop can be chosen among log n possible
candidates
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
©2014, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Question• When adding q to the Chord ring: in the
routing table of p1.Entries for i=1,2,3,4 change2.The entry for i=4 changes3.The entry for i=5 changes4.No entry changes
p p+2p+4
p+1
p+8
p+16
p2
p3p4
q
i si
1 p2
2 p2
3 p2
4 p3
5 p4
routing tableof p
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Example 3: Topological Routing (CAN)
• Based on hashing of keys into a d-dimensional space (a torus)– Each peer is responsible for keys of a subvolume of the space (a zone)– Each peer stores the addresses of peers responsible for the
neighboring zones for routing– Search requests are greedily forwarded to the peers in the closest
zones
CAN Zones• Example: d=2– Space is recursively split along each dimensions as
more peers join
– Peers maintain references to their neighboring zones: neighbors(p1) = {p2,p3}©2012, Karl Aberer, EPFL-IC, Laboratoire de
systèmes d'informations répartis
p1 p1 p2 p1p2
p3
etc.
CAN Routing Tables and Search
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Each peer p stores a routing table with 2d entries containing the two closest 2 neighbors in each dimension
neighbors(p1) = {p2,p3,p4,p5}neighbors(p3) = {p1,p6,p7,p4}
Example: search starting at p7, p8 for p6
p1p2
p3
p4
p5
p6
p8 p7search(p,k)if p=k then found elsefind among neighbors p* with minimal Euclidean distance to k, search(p*,k)
Routing table size:2dExpected search cost: O(d n^(1/d))
Network Join in CAN• Node joining the network
– Choses address (coordinate in d-dim. space)– Performs search for the address– Splits the region with the node currently managing it– Updates to own and neighboring nodes routing tables
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
p1p2
p3
p4
p5
p6
p8 p7p1p2
p3
p4
p5
p6
p7p8 p9
*
*
*
*
Neighbors(p8) = {p5,p9,*,*}Neighbors(p9) = {p8,p7,*,*}Neighbors(p8) = {p5,p7,*,*}
Expected cost: O(d n^(1/d))
Multiple Realities• r different coordinate spaces
– Peers hold a zone in each of them– Creates r replicas of the (key, value) pairs and increases robustness– Reduces path length as search can be continued in the reality where the target
is closest
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
p1
p1
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
CAN Path Length
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Increasing Dimensions and Realities
Question• When adding n peers to CAN the number of
zones1. Is exactly n2. It depends what the keys of the peers were3. It depends on the dimensionality of the key space
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
©2014, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis
Question
• In CAN, for a fixed dimensionality d>2, when moving from 1 to 2 realities1. The number of entries in the routing table
increases by 22. The number of entries in the routing table
increases by d3. The number of entries in the routing table
doubles