by jonathan drake. the gnutella protocol is simply not scalable this is due to the flooding...

By Jonathan Drake Slide 2 The Gnutella protocol is simply not scalable This is due to the flooding approach it currently utilizes As the nodes increase this approach causes an exponential increase in bandwidth usage DHT is one concept that has been considered but it performs poorly on multiple keyword searches Slide 3 DHT indexes specific keywords so this allows queries that search for specific needles in the haystack The reality is that most searches in P2P utilize multiple keywords Users tend to look for general results where multiple files could satisfy their needs DHT would be great for finding a specific file or one entry in groups of thousands but thats not the case Slide 4 Slide 5 Random walking is one solution but unfortunately it takes a lot of hops and doesnt guarantee it will find all the results the user wants Random walking selects a random peer node to query and may end up missing results the user wants unless it runs for long periods of time making it no better then flooding Slide 6 Supernodes work better but it still takes considerable resources and bandwidth because flooding (broadcasting) is still taking place between this super mesh This can cause failures among super nodes and still doesnt scale well when considering a file may only exist on a regular edge node Slide 7 Sure! Well the idea is that random walking has less of a cost then flooding so but still only chooses a random node to forward the query Nodes are not identical, some have more resources then others so why not take advantage of this? Thats what GIA proposes. When forwarding a query it should go to the node thats least overloaded and has the most available bandwidth Slide 8 Dynamic topology adaption choose neighbors that have high capacity so we pass off queries to nodes that can handle it Active Flow Control When a node gets overloaded it allocates less tokens for queries so that its not overloaded One-hop replication Keep an index of the files on all neighbors to help speed up querying Search Protocol Direct queries to the node with the highest capacity Slide 9 Topology Adaptation for GIA is an approach that chooses neighbors based on their overall capacity and current number of neighbors When a node gets a request from another node it only accepts it if it has the capacity If it doesnt it still favors the new node and drops another neighbor from the subset of nodes with lower capacity that has the most neighbors. This is based on the idea that the node that is dropped has the least to loose Slide 10 Tokens are assigned to neighbors based on their capacity (rather then uniformly) These are used to issue queries to other nodes They can start out uniformly but as nodes dont use their tokens they can be redistributed to other nodes until it reflects a weight towards capacity Slide 11 Replicate the contents of your neighbors in an index so that when a query comes you can respond with their file matches as well When a node leaves the node removes their information from the index Slide 12 Searching is essentially a biased random walk Each node sends the query to neighbor with the highest capacity it has tokens for (otherwise its queued for later) Book keeping is done with GUIDs to make sure we dont follow redundant paths TTL is used to end the query if its taking too long MAX_RESPONSES is the total responses that should be retrieved before sending results back Slide 13 You want a 90% success rate You can see that just over 10 is the Collapse Point. More replication makes things easier Slide 14 Higher CP is preferred and lower hop counts As replication rate increases CP increases and hop count decreases. GIA Wins! Slide 15 The authors thought of that and did some comparisons Slide 16 Yes but GIA scales to multiple responses with no issues They even found a proportion between MAX_RESPONSES and Replication factor! Slide 17 You can achieve even capacities by allowing nodes to replicate files and not just index the checksum and location (one hope replication) Im sure the RIAA and MPAA love this idea Slide 18 Satisfaction levels are used to help choose when to keep looking for higher capacity neighbors I = T x K -(1-S) Slide 19 Slide 20 Yes thats true. If a node loses a result the fallback is that the node who issued the request will not receive keep-alive messages from other nodes signaling it to reissue the request For cases involving topology adaptation it wont accept new queries after changing neighbors but it will still forward them along the old path Slide 21 Then ask me a question! Seriously any questions?

by jonathan drake. the gnutella protocol is simply not scalable this is due to the flooding...

Documents