improving search in p2p networks
DESCRIPTION
Improving Search in P2P Networks. By Shadi Lahham. Purpose of This Lecture. General understanding of P2P systems Appreciating the need for efficient search Applying different search techniques to different scenarios. P2P Basics What Is P2P Advantages of P2P Types of P2P Systems - PowerPoint PPT PresentationTRANSCRIPT
Improving Search in P2P Networks
By Shadi Lahham
Improving P2P Search 2
Purpose of This Lecture
• General understanding of P2P systems
• Appreciating the need for efficient search
• Applying different search techniques to different scenarios
Improving P2P Search 3
Table Of Contents
• P2P Basics– What Is P2P
– Advantages of P2P
– Types of P2P Systems
– Shortcomings
• Search Methods– The Search Problem
– Current Methods
– Suggested Methods
• Experimental Setup– Metrics– Data Collection– Calculating Costs
• Analysis of Results
• Conclusions
Introduction
P2P Basics
Improving P2P Search 5
What is P2P
• Distributed system
• Peers (nodes) are servers and clients simultaneously
• Peers are of equal roles
• Resources shared across peers
• No central server needed
• Examples of P2P system
Improving P2P Search 6
P2P Overview
file3f3
file2f2
file1f1
FileKey
Improving P2P Search 7
Advantages of P2P
• P2P vs. Centralized Servers– Distributes disk space / bandwidth
– Inexpensively scalable
– Self organized (autonomous)
– Load balancing
– Adaptative / fault tolerant
– Less susceptible to attacks
– Allows for redundancy
Improving P2P Search 8
Types of P2P Systems
• Hybrid ( napster )
• Pure ( gnutella )
• Super Peers ( kaZaA )
Improving P2P Search 9
Hybrid ( napster )
Improving P2P Search 10
Pure ( gnutella )
Improving P2P Search 11
Super Peers ( kaZaA )
• Make use of heterogeneity– Powerful peers serve as super peers
– Weaker peers act as clients
• Super-peers index clients’ files– Requires updates on join/leave/update
• Queries handled at super-peer level– Saves query costs
Improving P2P Search 12
Super Peers ( kaZaA )
Improving P2P Search 13
Hybrid - Shortcomings
• High cost on centralized index
• Performance & scalability bottleneck
• Needs maintenance
• Vulnerable ! Highly visible target
Improving P2P Search 14
Pure - Shortcomings
• Inefficient search (flooding)
• Heterogeneity of peers not considered– Bottlenecks (limited peers)
– Fragmentation
Improving P2P Search 15
Super Peers - Shortcomings
• Super nodes might become bottlenecks for clients– requires redundancy
• Bad selection of supernodes might cause even worse problems
Search Methods
Improving P2P Search 17
The Search Problem
• Connected graph
• Might contain cycles
• Individual node doesn’t know structure
• Only knows its neighbors
• No idea where data can be found
Improving P2P Search 18
The Search Problem
• Goal : Find as many occurrences of the data using min time and resources
• Solution : – BFS ?
– Bounded BFS ?– (naive approaches)
Improving P2P Search 19
Bounded BFS Search
TTL=2TTL=1TTL=0
Improving P2P Search 20
Bounded BFS Search
• Messages get a global TTL (time to live)
• Algorithm– Source broadcasts a message to a subset of
neighbors
– Neighbors search locally . Results are sent to source if found
– TTL = TTL – 1;
– As long as TTL > 0 Nodes forward message to neighbors
• Downside : wastes bandwidth / processing
Improving P2P Search 21
Current Methods
• Gnutella - BFS – High cost
– Gets complete results ( for depth D)
– Relatively short time
• Freenet - DFS – Poor response time
– Minimizes BW costs
Improving P2P Search 22
Suggested Methods
• Iterative deepening
• Directed BFS
• Local Indices
Improving P2P Search 23
Iterative Deepening
• Idea:– Search at a small depth and increase if
required
– Aims to minimize the cost of BFS without detracting from it’s ability to satisfy queries
• Notice that given enough iterations this method returns %100 results of BFS
Improving P2P Search 24
Iterative Deepening (cont…)
• Elements :– Policies P={a,b,c,..} define deepening
behavior
– BFS is run to depth a and frozen
– If source is satisfied it stops the process
– Otherwise it asks BFS to resume to depth b
– Process is repeated until source satisfied or we reach the last policy item
Improving P2P Search 25
Iterative Deepening (cont…)
• Elements :– We can specify how long to wait
between iterations
– We need a system-wide message ID to identify individual messages
Improving P2P Search 26
Example P={1,3,4} W=1
Improving P2P Search 27
Directed BFS
• Idea:– Choose a subset of neighbors to query
– Neighbors will BFS as usual
– Aims to provide a balance between good response time and results
– Minimize costs of full BFS
• Notice that only a subset of possible results are returned so we might fail to satisfy query
Improving P2P Search 28
Directed BFS Example
TTL=2TTL=1TTL=0
Improving P2P Search 29
Directed BFS (cont…)
• But which neighbors to pick ??– Maintain simple statistics on neighbors
to derive heuristics• Highest past results • Lowest average hops
– (close to nodes containing useful data) • High message count
– (stable - can handle large flow) • Shortest message queue
– (long implies saturation)• More to come …
Improving P2P Search 30
Local Indices
• Idea:– Nodes hold metadata of all nodes at
radius r
– Can process query at a few nodes, but get same number of results
– Aims to balance satisfaction / costs
Improving P2P Search 31
Local Indices
• Elements:– Policies P={a,b,c,..} define the depths at
which we search• Example P={1,5,6}• Nodes at depth 1 process the query• Nodes at depth 2,3,4 forward without
processing• Policy ends at depth 6
– System-wide Radius r (small ~ 50K metadata )
Improving P2P Search 32
Example P={1,4}
Process
Don’t process
r = ?
Improving P2P Search 33
Local Indices (cont…)
– Notice that now there is an overhead
– On Join• Send join message of TTL = r • Direct Exchange of metadata
– On leave / timeout• remove metadata of gone / dead nodes
– On Update• Send update message of TTL = r
Experimental Setup
Improving P2P Search 35
Metrics
• How to compare methods ?1. Costs
2. Results
3. Time
Improving P2P Search 36
Metrics
1. Costs – We do not base cost on a specific query but
rather calculate the average cost on Q rep ,
a representative set of real queries submitted
– It makes sense to discuss costs in aggregate (i.e., over all the nodes in the network)
– Therefore our two cost metrics are• Average aggregate bandwidth • Average aggregate processing cost
Improving P2P Search 37
Metrics
2. Results Quality– Number of results
– Satisfaction
3. Time to satisfaction
Improving P2P Search 38
Data Collection
• Data gathered from Gnutella network
• Directly measured– Iterative deepening
– Directed BFS
• Performance data & analysis– Local indices
Improving P2P Search 39
Data Collection
Number of hops
Response time
Results per message
Source IP
Etc …
Collected Data
Improving P2P Search 40
Data Collection
Symbol Description
M(Q; n) # of response messages received for query Q, from n hops away
R(Q; n) # of results received for query Q, from n hops away
N(Q; n) # of nodes n hops away that process Q
C(Q; n) # of redundant edges n hops away
Extracted Data
Improving P2P Search 41
Calculating Costs
• We’ve seen two types of costs– Bandwidth (BW) costs
– Processing costs
• Calculations should take into account– Costs of sending a query
– Costs of sending replies
• A example of calculating BW costs
Improving P2P Search 42
Calculating Costs
BWbfs (Q) = ∑ ( a(Q) · (N(Q,n) + C(Q,n)) D
n=1
+ n · ( c · R(Q,n) + d · M(Q,n))
a(Q) Size of query Q d Size of response message
c Size of result record D Max TTL
Analysis of Results
Iterative Deepening
Improving P2P Search 44
Symbols Used
Symbol Definition
D Maximum time-to-live of a message, in terms of hops
Z Number of results needed to satisfy a query
Qrep Representative set of queries for the Gnutella network
W Waiting time (in seconds) between iterations
Ng Number of neighbors of client (source node)
Improving P2P Search 45
Results – Iterative Deepening
• Recall that iterative deepening policies P={a,b,c,..} define deepening behavior
• In order to have the same level of satisfaction as BFS a policy must have D as the last depth
• Also note the degenerate case policy {D} which is the bounded BFS we presenter earlier
Improving P2P Search 46
Results – Iterative Deepening
• Variables– Define :
Pd = { d , d+1 , … , D }
P = { Pd for d = 1,2,…,D }
= { {1,2,…D}, {2,3,…D},…, {D-1,…D},{D} }
W (waiting time) can take the values
1,2,4,6,150 (seconds)
Improving P2P Search 47
Results – Iterative Deepening
• Fixed values Z = 50 , Ng = 8
– Increasing Z• Lower probability of satisfaction• Higher costs• More results
– Decreasing Ng• Slightly Lower probability of satisfaction• Significantly Lower costs
Improving P2P Search 48
Results – Iterative Deepening
Improving P2P Search 49
Results – Iterative Deepening
• BW costs same for P7 for all W’s
• As d increases costs increase.the larger d is the more likely the policy will “overshoot”
• As W decreases costs increaseon a small W premature determination of un-satisfaction again leads to overshooting
Improving P2P Search 50
Results – Iterative Deepening
Improving P2P Search 51
Results – Iterative Deepening
• Time to satisfaction is inversely proportional to cost
• Choose a policy that balances average waiting time and cost
• For example {P5 W=6}
Analysis of Results
Directed BFS
Improving P2P Search 53
Heuristics - Directed BFS
Symbol HeuristicRAND (Random)
>RES Returned the greatest number of results*
<TIME Had the shortest average time to satisfaction*
<HOPS smallest average number of hops taken by results*
>MSG Sent our client the greatest number of messages (all types)
<QLEN Had the shortest message queue
<LAT Had the shortest latency
>DEG Had the highest degree (number of neighbors)
*in the past 10 queries
Improving P2P Search 54
Results – Directed BFS
Improving P2P Search 55
Results – Directed BFS
Improving P2P Search 56
Results – Directed BFS
Improving P2P Search 57
Results – Directed BFS
• Costs in directed BFS unaffected by Z
• Users more aware of quality of results than BW costs – We recommend >RES <TIME
– Still cheaper than full BFS (~65%)
• Sum up till now– Iterative deepening - lowest costs
– Directed BFS – fastest time to satisfaction
Analysis of Results
Local Indices
Improving P2P Search 59
Results – Local Indices
• Recall that iterative deepening policies P={a,b,c,..} define the depths at which we search
• We choose policies that minimize the number of nodes that process the query
Improving P2P Search 60
Results – Local Indices
• We consider the following policies
Improving P2P Search 61
Results – Local Indices
• Also recall that joins / leaves / updates have a BW overhead
• QJR (QueryJoinRatio) gives us the ratio of queries to joins/leaves in the network
Improving P2P Search 62
Results – Local Indices
P0 r=0
Improving P2P Search 63
Results – Local Indices
Improving P2P Search 64
Results – Local Indices
21MB
71 KB
Improving P2P Search 65
Results – Local Indices
• Time to Satisfaction– Because most Query and Response
messages have r fewer hops to travel, the time to forward messages to the outermost depth and back to the source will be shorter than for BFS
– However, because nodes have larger indices, processing the query should take more time.
Improving P2P Search 66
Results – Local Indices
• Summary– Huge savings in costs
– Time to satisfaction comparable to BFS
– Determining r must take QJR into consideration
• For current QJR values (e.g. Gnutella = 10) r =1 is a good choice
Improving P2P Search 67
Relative performance
Technique Time to satisfy
Satisfaction
Probability
Number of results
Aggregate Bandwidth
Aggregate
Processing
Bounded BFS 100% 100% 100% 100% 100%
Iterative deepening 190% 100% 19% 28% 47%
Directed BFS 140% 86% 37% 38% 28%
Local indices
≈100%
100% 100% 39% 51%
Improving P2P Search 68
Conclusions
• All 3 methods show significant bandwidth and processing savings
• Methods are simple and easy to implement in current systems
• Methods might be used in conjunction
Improving P2P Search 69
Bibliography
Yang, Beverly; Garcia-Molina, Hector :• Improving Search in Peer-to-Peer Systems
http://newdbpubs.stanford.edu:8090/pub/2002-28
• Improving Search in Peer-to-Peer Systems [extended]
http://newdbpubs.stanford.edu:8090/pub/2001-47
• Designing a Super-peer Network http://newdbpubs.stanford.edu:8090/pub/2003-33
Gnutella websitehttp://www.gnutella.com/
Thank you