P2P SearchCOP5711
2
P2P Search Techniques Centralized P2P systems
e.g. Napster, SETI@home
Decentralized & unstructured P2P systems e.g. Gnutella
Hybrid - partially decentralized e.g., Freenet
Structured P2P systems DHT CAN
P2P Network P2P network is an overlay
network built on top of a real physical network (e.g., Internet)
In a P2P network, peers are network nodes connected by virtual or logical links
A logical link is a path through many physical links in the underlying network
3
4
Napster server(Central Catalog)
(xyz.mp3, 192.1.2.3)
192.1.2.3
Napster: Publish a File
Users upload their IP address and music titles they wish to share
5
Users search for peers to download desired files
xyz.mp3 ?
192.1.2.3192.1.2.3
Napster: Query for a File
Central Napster server
6
File transfer is P2P, using a proprietary protocol
192.1.2.3
xyz.mp3 ?
Napster: Transfer Requested File
Central Napster server
7
Disadvantage of Centralized Directory
Performance bottleneck
Single point of failure
Can we do it without a directory ?
8
Decentralized P2P - Gnutella No catalog
Pings network to locate Gnutella peers
File requests are broadcast to peers
Flooding or breadth-first research
When provider is located, the file is transferred via HTTP
9
Who are my neighbors ?
Gnutella: Join the Network
Peers areInternetedges
Special peer maintained by Gnutella
Pings network
to locate peers
10
xyz.mp3 ?
Gnutella: Broadcast Request to Peers
11
Gnutella: Flood the Request (Breadth-first research)
I have it.
12
xyz.mp3
Gnutella: Reply with the File(via HTTP)
I have it.
13
Gnutella - Disadvantages Network flooding - unnecessary
network traffic
Using TTL - some files might not be found
Alternatively, using ultranodes (or supernodes)using depth-first search, i.e., Freenet
14
Morpheus, KazaaFlooding only the Supernodes
Cluster
Cluster
Cluster
Center Index for its cluster
C
B
A
F
E
D
I
H
G
Query: “W
ho has
file X”
Reply: “Peer H
has
file X”
Download file X from Peer H
SupernodeLayer
15
Using Ultranodes Queries flood only the network of
ultranodes
Other peer nodes shielded from query traffic
Combine the benefits of centralized and decentralized search;
Take advantage of the heterogeneity in peer capabilities;
16
Freenet - Depth-First Search
A
B
D
C
E
Query: “Who has file X”
Peer D might have file X
Peer E might have file X
Reply: “I have file X”
Reply : “Peer E has file X”
Reply : “Peer E
has file X”
Download file X from Peer E
Peer C might
have file X
17
Freenet – File not Found
A
B
D
C
E
Peer D might have file X
Peer E might have file X
Peer C might
have file X F
NOT FOUND !
The requested file not found due to a poor routing decision made at peer D
In this case, query backs out of the dead-end, and tries another peer in depth-first manner
I havefile X
Using Distributed Directory Data objects are everywhere
Distribute subsets of the data directory among peers
If we can find the relevant sub-directory, we can locate the data object
18
DirectoryData
ObjectsSub-directory
19
How to Bound Search Space ?Basic Idea - Hashing
Hash key
Object “y”
Objects have hash keys
Peer “x”Peer nodes also have hash keys in the same hash space
P2P Network
y xH(y) H(x)
Join (H(x))Publish (H(y))
Place location information about an object at the peer with closest hash keys (i.e., a distributed directory)
20
Viewed as a Distributed Hash Table
Hash table0 2128-1
Peer nodes• Each peer node is responsible for a range of
the hash table, according to the peer hash key
• Location information about Objects are placed in the peer with the closest key (information redundancy)
21
How to Find an Object ?Looks for a peer /w the corresponding peer hash key
A peer knows its logical neighbors Find peer X based on multihop routing X knows who has the object
Hashtable
0 2128-1
Peernode X
Peer Y has the file
22
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Dynamic Hash Table (DHT) in action
23
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
24
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: put()
insert(K1,V1)
Operation: Route message, “I have the file,” to node holding key K1
Want to share a
file
25
(K1,V1)
K V
K VK V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: put()
Operation: take key as input; route messages to node holding key
26
retrieve (K1)
K V
K VK V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: get()
Operation: Retrieve message V1 at node holding key K1
27
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
Retrieve file according to V1
28
Still Flooding
Still flood the network although intermediate nodes do not need to search
Can we avoid flooding ?
29
CAN – Content Addressable Network Each peer is
responsible for one zone, i.e., stores all (key, value) pairs of the zone
Each peer knows the neighbors of its zone
Random assignment of peers to zones at startup – split zone if not empty
Dimensional-ordered multihop routing
30
CAN: Object Publishing
node I::publish(K,V) I
31
(1) a = hx(K)
CAN: Object Publishingx = a
node I::publish(K,V) I
32
(1) a = hx(K) b = hy(K)
CAN: Object Publishingx = a
y = b
node I::publish(K,V) I
33
(1) a = hx(K) b = hy(K)
CAN: Object Publishing
(2) route (K,V) -> J
node I::publish(K,V) I
J
34
(2) route (K,V) -> J
(3) J stores (K,V)
CAN: Object Publishing
(K,V)
node I::publish(K,V) I
(1) a = hx(K) b = hy(K)
J
35
(2) route “retrieve(K)” to J that is in charge of (a,b)
(K,V)(1) a = hx(K) b = hy(K)
node I::retrieve(K)
I
CAN: Object Retrieval
J
36
Maintenance
Inform neighbors that you are alive at discrete time interval t
If your neighbor does not send alive message in time t, takeover its zone
P2P Benefits Efficient use of resources
Use unused bandwidth, storage, and processing power at the edge of the network
Scalability Consumers of resources also donate resources
Reliability Replicas, geographic distribution No single point of
failure Ease of administration
Self organized nodes Built-in reliability and load balancing
37
Some Prototypes at UCF iSEE (Internet-scale Sensor Exploration Environement)Publishing real-time sensor data
Browsing and querying real-time sensor data
P2P Video Streaming for VoD and Live Broadcast Applications
38