p2p file sharing systems johnny wong note: materials of these slides are based on the those from the...

P2P File Sharing Systems

Johnny Wong

Note: Materials of these slides are based on the those from the textbook “Computer Networking: A

Top-Down Approach featuring the Internet”by

J.F Kurose and K.W. Ross

P2P file sharingExample

• Alice runs P2P client application on her notebook computer

• Intermittently connects to Internet; gets new IP address for each connection

• Asks for “Hey Jude”

• Application displays other peers that have copy of Hey Jude.

• Alice chooses one of the peers, Bob.

• File is copied from Bob’s PC to Alice’s notebook: HTTP

• While Alice downloads, other users uploading from Alice.

• Alice’s peer is both a Web client and a transient Web server.

All peers are servers = highly scalable!

P2P: centralized directory

original “Napster” design

1) when peer connects, it informs central server:– IP address– content

2) Alice queries for “Hey Jude”

3) Alice requests file from Bob

centralizeddirectory server

peers

Alice

Bob

1

1

1

12

3

P2P: problems with centralized directory

• Single point of failure• Performance

bottleneck• Copyright

infringement

file transfer is decentralized, but locating content is highly centralized

P2P: Query flooding

• Gnutella • no hierarchy• use bootstrap node to learn

about others• join message

• Send query to neighbors• Neighbors forward

query• If queried peer has

object, it sends message back to querying peer

join

P2P: more on query flooding

Pros• peers have similar

responsibilities: no group leaders

• highly decentralized• no peer maintains

directory info

Cons• excessive query

traffic• query radius: may not

have content when present

• bootstrap node• maintenance of

overlay network

P2P: decentralized directory

• Each peer is either a group leader or assigned to a group leader.

• Group leader tracks the content in all its children.

• Peer queries group leader; group leader may query other group leaders.

ordinary peer

group-leader peer

neighoring re la tionshipsin overlay network

More about decentralized directoryoverlay network• peers are nodes• edges between peers

and their group leaders• edges between some

pairs of group leaders• virtual neighbors

bootstrap node• connecting peer is either

assigned to a group leader or designated as leader

advantages of approach• no centralized directory

server– location service

distributed over peers– more difficult to shut

down

disadvantages of approach• bootstrap node needed• group leaders can get

overloaded

Unstructured P2P File Sharing

• Centralized– Napster

• Distributed– Gnutella – KaZaA

Napster: how does it work

• Application-level, client-server protocol over point-to-point TCP

• Centralized directory server • Steps:

– connect to Napster server – upload your list of files to server. – give server keywords to search the full list with. – select “best” of matching answers (pings)

• Pings the candidate server using RTT

Gnutella

• Open-source

• Links are TCP-connections

• Each peer sends a query to each of its peers; the receiving peer sends a search query; once the file is found, the response follows the search path backward to the querier; file transfer is point-to-point

Gnutella (con’t)

– decentralized searching for files

– central directory server no longer the bottleneck

• more difficult to “pull plug” – each application instance serves to:

• store selected files • route queries from and to its neighboring peers • respond to queries if file stored locally • serve files

Gnutella history

– 3/14/00: release by AOL, almost immediately withdrawn

– became open source – many iterations to fix poor initial design

(poor design turned many people off) – issues:

• how much traffic does one query generate? • how many hosts can it support at once? • what is the latency associated with querying? • is there a bottleneck?

Gnutella: limited scope query

• Searching by flooding: – if you don’t have the file you want, query 7 of

your neighbors. – if they don’t have it, they contact 7 of their

neighbors, for a maximum hop count of 10. – reverse path forwarding for responses (not

files) • (useful for saving TCP connections)

Gnutella in practice

• Gnutella traffic << KaZaA traffic

• Anecdotal: – Couldn’t find anything – Downloads wouldn’t complete

• Fixes: do things KaZaA is doing: hierarchy, queue management, parallel download,…

KaZaA: Technology

Software • Proprietary • control data encrypted (including queries/responses)

– KaZaA Web site gives a few hits– Some studies described in Web

• Everything in HTTP request and response messages

Architecture • hierarchical • cross between Napster and Gnutella

KaZaA: The service (2)

• User can configure max number of simultaneous uploads and max number of simultaneous downloads – queue management at server and client

• Frequent uploaders can get priority in server queue – Keyword search

• User can configure “up to x” responses to keywords • Responses to keyword queries come in waves; stops

when x responses are found • From user’s perspective, service resembles Google, but

provides links to MP3s and videos rather than Web pages

KaZaA: Architecture

• Each peer is either a supernode or an ordinary node (assigned to one supernode) – Each supernode connected to many other

supernodes (supernode overlay)

• Nodes that have more connection bandwidth and are more available are designated as supernodes

KaZaa Supernode

• Each supernode acts as a mini-Napster hub, tracking the content and IP addresses of its descendants

• Guess: supernode has (on average) 200-500 descendants; roughly 10,000 supernodes

KaZaA: Overlay maintenance

• List of potential supernodes included with software download – New peer goes through list until it finds

operational supernode – Connects, obtains more up-to-date list

• Node then pings nodes on list and connects with the one with smallest RTT

• If supernode goes down, node obtains updated list and chooses new supernode

KaZaA Queries

– Node first sends query (keyword) to supernode – Supernode responds with matches

• If x matches found, done. – Otherwise, supernode forwards query to subset of

supernodes

• If total of x matches found, done. – Otherwise, query further forwarded – Probably by original supernode rather than

recursively?

p2p file sharing systems johnny wong note: materials of these slides are based on the those from the...

Documents

centralized slide

napster server

pointtopoint slide

ross slide

overloaded slide

rtt slide

central server

group leaders edges