p2p1 peer-to-peer applications course on computer communication and networks, cth/gu the slides are...
Post on 21-Dec-2015
221 views
TRANSCRIPT
P2P 1
Peer-to-Peer Applications
Course on Computer Communication and Networks, CTH/GU
The slides are adaptation of slides of the authors of the main textbook of the course, of the authors in the bibliography section and of Jeff Pang.
P2P 2
P2P file sharing
Example Alice runs P2P client
application on her notebook computer
Intermittently connects to Internet; gets new IP address for each connection
Asks for “Hey Jude” Application displays
other peers that have copy of Hey Jude.
Alice chooses one of the peers, Bob.
File is copied from Bob’s PC to Alice’s notebook: HTTP
While Alice downloads, other users uploading from Alice.
Alice’s peer is both a Web client and a transient Web server.
All peers are servers = highly scalable!
P2P 3
Intro
Quickly grown in popularity Dozens or hundreds of file sharing applications many million people worldwide use P2P networks Audio/Video transfer now dominates traffic on the
Internet
But what is P2P? Searching or location? Computers “Peering”? Take advantage of resources at the edges
of the network• End-host resources have increased dramatically• Broadband connectivity now common
P2P 4
The Lookup Problem
Internet
N1
N2 N3
N6N5
N4
Publisher
Key=“title”Value=MP3 data… Client
Lookup(“title”)
?
P2P 5
The Lookup Problem (2)
Common Primitives: Join: how to I begin participating? Publish: how do I advertise my file? Search: how to I find a file? Fetch: how to I retrieve a file?
P2P 6
Overview
Centralized Database Napster
Query Flooding Gnutella
Hierarchical Query Flooding KaZaA
Swarming BitTorrent
Unstructured Overlay Routing Freenet
Structured Overlay Routing Distributed Hash Tables
Other
P2P 7
P2P: centralized directoryoriginal “Napster”
design (1999, S. Fanning)
1) when peer connects, it informs central server: IP address content
2) Alice queries directory server for “Hey Jude”
3) Alice requests file from Bob
centralizeddirectory server
peers
Alice
Bob
1
1
1
12
3
Problems? Single point of failure Performance
bottleneck Copyright infringement
P2P 8
Napster: Publish
I have X, Y, and Z!
Publish
insert(X, 123.2.21.23)...
123.2.21.23
P2P 9
Napster: Search
Where is file A?
Query Reply
search(A)-->123.2.0.18Fetch
123.2.0.18
P2P 10
Napster: Discussion
Pros: Simple Search scope is O(1) Controllable (pro or con?)
Cons: Server maintains O(N) State Server does all processing Single point of failure
P2P 11
Overview
Centralized Database Napster
Query Flooding Gnutella
Hierarchical Query Flooding KaZaA
Swarming BitTorrent
Unstructured Overlay Routing Freenet
Structured Overlay Routing Distributed Hash Tables
Other
P2P 12
Gnutella: History
In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella
Soon many other clients: Bearshare, Morpheus, LimeWire, etc.
In 2001, many protocol enhancements including “ultrapeers”
P2P 13
Gnutella: Overview
Query Flooding:Join: on startup, client contacts a few
other nodes; these become its “neighbors”
Publish: no needSearch: ask neighbors, who ask their
neighbors, and so on... when/if found, reply to sender.
Fetch: get the file directly from peer
P2P 14
I have file A.
I have file A.
Gnutella: Search
Where is file A?
Query
Reply
P2P 15
Gnutella: protocol
Query
QueryHit
Query
Query
QueryHit
Query
Query
QueryHit
File transfer:HTTP
Query messagesent over existing TCPconnections peers forwardQuery message QueryHit sent over reversepath
Scalability:limited scopeflooding
P2P 16
Gnutella: More on Peer joining1. Joining peer X must find some other peer in
Gnutella network (from gnutellahosts.com): use list of candidate peers
2. X sequentially attempts to make TCP with peers on list until connection setup with Y
3. X sends Ping message to Y; Y forwards (floods) Ping message.
4. All peers receiving Ping message respond with Pong message
5. X receives many Pong messages. It can then (later on, if needed) setup additional TCP connections
P2P 17
Query flooding: Gnutella
fully distributed no central server
public domain protocol
many Gnutella clients implementing protocol
overlay network: edge between peer X
and Y if there’s a TCP connection
all active peers and edges is overlay net
Edge is not a physical link
Given peer will typically be connected with < 10 overlay neighbors
What is routing in p2p networks?
P2P 18
Gnutella: Discussion
Pros: Fully de-centralized Search cost distributed
Cons: Search scope is O(N) Search time is O(???) Nodes leave often, network unstable
Improvements: limiting depth of search Random walks instead of flooding
P2P 19
Aside: Search Time?
P2P 20
Aside: All Peers Equal?
56kbps Modem
10Mbps LAN
1.5Mbps DSL
56kbps Modem56kbps Modem
1.5Mbps DSL
1.5Mbps DSL
1.5Mbps DSL
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
P2P 21
Overview
Centralized Database Napster
Query Flooding Gnutella
Hierarchical Query Flooding KaZaA
Swarming BitTorrent
Unstructured Overlay Routing Freenet
Structured Overlay Routing Distributed Hash Tables
Other
P2P 22
KaZaA: History
In 2001, KaZaA created by Dutch company Kazaa BV
Single network called FastTrack used by other clients as well: Morpheus, giFT, etc.
Eventually protocol changed so other clients could no longer talk to it
popular file sharing network with >10 million users (number varies)
P2P 23
KaZaA: Overview
“Smart” Query Flooding: Join: on startup, client contacts a “supernode” ...
may at some point become one itself Publish: send list of files to supernode Search: send query to supernode, supernodes flood
query amongst themselves. Fetch: get the file directly from peer(s); can fetch
simultaneously from multiple peers
P2P 24
KaZaA: Network Design
“Super Nodes”
P2P 25
KaZaA: File Insert
I have X!
Publish
insert(X, 123.2.21.23)...
123.2.21.23
P2P 26
KaZaA: File Search
Where is file A?
Query
search(A)-->123.2.0.18
search(A)-->123.2.22.50
Replies
123.2.0.18
123.2.22.50
P2P 27
KaZaA: Discussion
Pros: Tries to take into account node heterogeneity:
• Bandwidth• Host Computational Resources• Host Availability (?)
Rumored to take into account network locality Cons:
Mechanisms easy to circumvent Still no real guarantees on search scope or search time
P2P architecture used by Skype
P2P 28
Overview
Centralized Database Napster
Query Flooding Gnutella
Hierarchical Query Flooding KaZaA
Swarming BitTorrent
Unstructured Overlay Routing Freenet
Structured Overlay Routing Distributed Hash Tables
Other
P2P 29
BitTorrent: History
In 2002, B. Cohen debuted BitTorrent Key Motivation:
Popularity exhibits temporal locality (Flash Crowds) E.g., Slashdot effect, CNN on 9/11, new movie/game
release Focused on Efficient Fetching, not Searching:
Distribute the same file to all peers Single publisher, multiple downloaders
Has some “real” publishers: Blizzard Entertainment using it to distribute games
P2P 30
BitTorrent: Overview
Swarming: Join: contact centralized “tracker” server,
get a list of peers. Publish: Run a tracker server. Search: Out-of-band. E.g., use Google to
find a tracker for the file you want. Fetch: Download chunks of the file from
your peers. Upload chunks you have to them.
2: Application Layer 31
File distribution: BitTorrent
tracker: tracks peers participating in torrent
torrent: group of peers exchanging
chunks of a file
obtain listof peers
trading chunks
peer
P2P file distribution
2: Application Layer 32
BitTorrent (1)
file divided into 256KB chunks. peer joining torrent:
has no chunks, but will accumulate them over time
registers with tracker to get list of peers, connects to subset of peers (“neighbors”)
while downloading, peer uploads chunks to other peers.
peers may come and go once peer has entire file, it may (selfishly) leave
or (altruistically) remain
2: Application Layer 33
BitTorrent (2)
Pulling Chunks at any given time,
different peers have different subsets of file chunks
periodically, a peer (Alice) asks each neighbor for list of chunks that they have.
Alice sends requests for her missing chunks rarest first
Sending Chunks: tit-for-tat Alice sends chunks to four
neighbors currently sending her chunks at the highest rate
re-evaluate top 4 every 10 secs
every 30 secs: randomly select another peer, starts sending
chunks newly chosen peer may
join top 4 “optimistically unchoke”
2: Application Layer 34
BitTorrent: Tit-for-tat(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers
With higher upload rate, can find better trading
partners & get file faster!
P2P 35
BitTorrent: Publish/Join
Tracker
P2P 36
BitTorrent: Fetch
P2P 37
BitTorrent: Sharing Strategy
“Tit-for-tat” sharing strategy “I’ll share with you if you share with me” Be optimistic: occasionally let freeloaders download
• Otherwise no one would ever start!• Also allows you to discover better peers to download
from when they reciprocate
Approximates Pareto Efficiency Game Theory: “No change can make anyone better
off without making others worse off”
P2P 38
BitTorrent: Summary
Pros: Works reasonably well in practice Gives peers incentive to share resources;
avoids freeloaders Cons:
Central tracker server needed to bootstrap swarm
2: Application Layer 39
BTW: File Distribution: Server-Client vs P2PQuestion : How much time to distribute file
from one server to N peers?
us
u2d1 d2
u1
uN
dN
Server
Network (with abundant bandwidth)
File, size F
us: server upload bandwidth
ui: peer i upload bandwidth
di: peer i download bandwidth
2: Application Layer 40
BTW: File distribution time: server-client
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F server
sequentially sends N copies: NF/us time
client i takes F/di
time to download
increases linearly in N(for large N)
= dcs = max { NF/us, F/min(di) }i
Time to distribute F to N clients using
client/server approach
2: Application Layer 41
BTW: File distribution time: P2P
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F server must send one
copy: F/us time
client i takes F/di time to download
NF bits must be downloaded (aggregate)
fastest possible upload rate: us + ui
dP2P = max { F/us, F/min(di) , NF/(us + ui) }i
2: Application Layer 42
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30 35
N
Min
imu
m D
istr
ibut
ion
Tim
e P2P
Client-Server
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
P2P 43
Overview
Centralized Database Napster
Query Flooding Gnutella
Hierarchical Query Flooding KaZaA
Swarming BitTorrent
Unstructured Overlay Routing Freenet
Structured Overlay Routing Distributed Hash Tables
Other
P2P 44
Freenet: History
In 1999, I. Clarke started the Freenet project
Basic Idea: Employ shortest-path-like routing on the
overlay network to publish and locate files Addition goals:
Provide anonymity and security Make censorship difficult
P2P 45
Freenet: Overview
Routed Queries: Join: on startup, client contacts a few other nodes it
knows about; gets a unique node id Publish: route file contents toward the file id. File
is stored at node with id closest to file id Search: route query for file id toward the closest
node id (DFS+caching in freenet vs. BFS+nocaching in Gnutella)
Fetch: when query reaches a node containing file id, it returns the file to the sender
P2P 46
Freenet: Routing Tables
id – file identifier (e.g., hash of file) next_hop – another node that stores the file id file – file identified by id being stored on the local
node
Forwarding of query for file id If file id stored locally, then stop
• Forward data back to upstream requestor If not, search for the “closest” id in the table, and
forward the message to the corresponding next_hop
If data is not found, failure is reported back• Requestor then tries next closest match in routing
table
id next_hop file
……
P2P 47
Freenet: Routing
4 n1 f412 n2 f12 5 n3
9 n3 f9
3 n1 f314 n4 f14 5 n3
14 n5 f1413 n2 f13 3 n6
n1 n2
n3
n4
4 n1 f410 n5 f10 8 n6
n5
query(10)
1
2
3
4
4’
5
id next_hop file
……
DFS+caching in freenet vs. BFS+nocaching in Gnutella)
P2P 48
Freenet: Routing Properties
“Close” file ids tend to be stored on the same node Why? Publications of similar file ids route toward the
same place Network tend to be a “small world”
Small number of nodes have large number of neighbors (i.e., ~ “six-degrees of separation”)
Consequence: Most queries only traverse a small number of hops to
find the file (cf. also 80-20 rule)
P2P 49
Freenet: Discussion
Pros: Intelligent routing makes queries relatively short Search scope small (only nodes along search path
involved); no flooding Cons:
Still no provable guarantees!
+/- anonymity Anonymity properties may give you “plausible deniability” Anonymity features make it hard to measure, debug
P2P 50
Overview
Centralized Database Napster
Query Flooding Gnutella
Hierarchical Query Flooding KaZaA
Swarming BitTorrent
Unstructured Overlay Routing Freenet
Structured Overlay Routing Distributed Hash Tables
Other
P2P 51
DHT: History
In 2000-2001, academic researchers said “we want to play too!”
Motivation: Frustrated by popularity of all these “half-baked” P2P
apps :) We can do better! (so we said) Guaranteed lookup success for files in system Provable bounds on search time Provable scalability to millions of node
Hot Topic in networking ever since
P2P 52
MotivationMotivation
How to find data in a distributed file sharing system?
Lookup is a key problem
Internet
PublisherKey=“LetItBe”
Value=MP3 data
Lookup(“LetItBe”)
N1
N2 N3
N5N4Client ?
P2P 53
Centralized SolutionCentralized Solution
Requires O(M) state Single point of failure
Internet
PublisherKey=“LetItBe”
Value=MP3 data
Lookup(“LetItBe”)
N1
N2 N3
N5N4Client
DB
Central server (Napster)
P2P 54
Distributed Solution (1)Distributed Solution (1)
Worst case O(N) messages per lookup
Internet
PublisherKey=“LetItBe”
Value=MP3 data
Lookup(“LetItBe”)
N1
N2 N3
N5N4Client
Flooding (Gnutella, Morpheus, etc.)
P2P 55
Distributed Solution (2)Distributed Solution (2) Routed messages (Freenet, Tapestry, Chord, CAN, etc.)
Internet
PublisherKey=“LetItBe”
Value=MP3 data
Lookup(“LetItBe”)
N1
N2 N3
N5N4Client
Only exact matches
P2P 56
Routing ChallengesRouting Challenges
Define a useful key nearness metric
Keep the hop count small
Keep the routing tables “right size”
Stay robust despite rapid changes in membership
P2P 57
DHT: Overview
Abstraction: a distributed “hash-table” (DHT) data structure:
put(id, item); item = get(id);
Implementation: nodes in system form a distributed data structure
Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ...
P2P 58
DHT: Overview (2)
Structured Overlay Routing: Join: On startup, contact a “bootstrap” node and integrate
yourself into the distributed data structure; get a node id Publish: Route publication for file id toward a close node
id along the data structure Search: Route a query for file id toward a close node id.
Data structure guarantees that query will meet the publication.
Fetch: Two options:• Publication contains actual file => fetch from where query
stops• Publication says “I have file X” => query tells you 128.2.1.3
has X, use IP routing to get X from 128.2.1.3
P2P 59
Chord OverviewChord Overview
Provides peer-to-peer hash lookup service:
Lookup(key) IP address
How does Chord locate a node?
How does Chord maintain routing tables?
How does Chord cope with changes in membership?
P2P 60
Chord IDsChord IDs
m bit identifier space for both keys and nodes
Key identifier = SHA-1(key)
Key=“LetItBe” ID=60SHA-1
IP=“198.10.10.1” ID=123SHA-1
Node identifier = SHA-1(IP address)
Both are uniformly distributed
How to map key IDs to node IDs?
P2P 61
Consistent Hashing [Karger 97]Consistent Hashing [Karger 97]
A key is stored at its successor: node with next higher ID
N32
N90
N123 K20
K5
Circular 7-bitID space
0IP=“198.10.10.1”
K101
K60Key=“LetItBe”
P2P 62
Consistent HashingConsistent Hashing Every node knows of every other node
requires global information Routing tables are large O(N) Lookups are fast O(1)
N32
N90
N123
0
Hash(“LetItBe”) = K60
N10
N55
Where is “LetItBe”?
“N90 has K60”
K60
P2P 63
Chord: Basic LookupChord: Basic Lookup
N32
N90
N123
0
Hash(“LetItBe”) = K60
N10
N55
Where is “LetItBe”?
“N90 has K60”
K60
Every node knows its successor in the ring
requires O(N) time
P2P 64
“Finger Tables”“Finger Tables”
Every node knows m other nodes in the ring
Increase distance exponentially
N8080 + 20
N112
N96
N16
80 + 21
80 + 22
80 + 23
80 + 24
80 + 25 80 + 26
P2P 65
“Finger Tables”“Finger Tables”
Finger i points to successor of n+2i
N120
N8080 + 20
N112
N96
N16
80 + 21
80 + 22
80 + 23
80 + 24
80 + 25 80 + 26
P2P 66
Lookups are FasterLookups are Faster
Lookups take O(Log N) hops
Resembles binary search
N32
N10
N5
N20
N110
N99
N80
N60
Lookup(K19)
K19
P2P 67
Chord propertiesChord properties
Efficient: O(Log N) messages per lookup
N is the total number of servers
Scalable: O(Log N) state per node
Robust: survives massive changes in membership
Proofs are in paper / tech report
Assuming no malicious participants
P2P 68
DHT: Chord Summary
Routing table size?Log N fingers
Routing time?Each hop expects to 1/2 the distance to the
desired id => expect O(log N) hops.
P2P 69
DHT: Discussion
Pros: Guaranteed Lookup O(log N) per node state and search scope
Cons: No one uses them? (only one file sharing
app) Supporting non-exact match search is hard
P2P 70
Overview
Centralized Database Napster
Query Flooding Gnutella
Hierarchical Query Flooding KaZaA
Swarming BitTorrent
Unstructured Overlay Routing Freenet
Structured Overlay Routing Distributed Hash Tables
Other
P2P 71
Other P2P networks
Content-Addressable Network (CAN) topological routing (k-dimensional grid)
Tapestry, Pastry, DKS: ring organization as Chord, other measure of key closeness, k-ary search instead of binary search paradigm
Viceroy: butterfly-type overlay network .... Sethi@home: for sharing computational
resources... Skype ....
2: Application Layer 72
P2P Case study: Skype
inherently P2P: pairs of users communicate.
proprietary application-layer protocol (inferred via reverse engineering)
hierarchical overlay with SNs
Index maps usernames to IP addresses; distributed over SNs
Skype clients (SC)
Supernode (SN)
Skype login server
2: Application Layer 73
Peers as relays
Problem when both Alice and Bob are behind “NATs”. NAT prevents an
outside peer from initiating a call to insider peer
Solution: Using Alice’s and Bob’s
SNs, Relay is chosen Each peer initiates
session with relay. Peers can now
communicate through NATs via relay
P2P 74
P2P: Summary
Many different styles of organization (data, routing); pros and cons of each
Lessons learned: Single points of failure are bad Flooding messages to everyone is bad Underlying network topology is important Not all nodes are equal Need incentives to discourage freeloading Privacy and security are important Structure can provide bounds and guarantees
Observe: Application-layer networking (sublayers...)
P2P 75
Bibliography
Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, Hari Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. Proceedings of the ACM SIGCOMM, 2001.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker. A Scalable Content-Addressable Network. Proceedings of the ACM SIGCOMM, 2001.
M.A. Jovanovic, F.S. Annexstein, and K.A.Berman. Scalability Issues in Large Peer-to-Peer Networks - A Case Study of Gnutella. University of Cincinnati, Laboratory for Networks and Applied Graph Theory, 2001.
Frank Dabek, Emma Brunskill, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan. Building Peer-to-Peer Systems With Chord, a Distributed Lookup Service. Proceedings of the 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII), 2001.
Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability. LLNCS 2009. Springer Verlag 2001.
Some extra notes
P2P 76
P2P 77
Chord: Joining the RingChord: Joining the Ring
Three step process:
Initialize all fingers of new node
Update fingers of existing nodes
Transfer keys from successor to new node
Less aggressive mechanism (lazy finger update):
Initialize only the finger to successor node
Periodically verify immediate successor, predecessor
Periodically refresh finger table entries
P2P 78
Joining the Ring - Step 1Joining the Ring - Step 1
Initialize the new node finger table
Locate any node p in the ring
Ask node p to lookup fingers of new node N36
Return results to new node
N36
1. Lookup(37,38,40,…,100,164)
N60
N40
N5
N20N99
N80
P2P 79
Joining the Ring - Step 2Joining the Ring - Step 2
Updating fingers of existing nodes
new node calls update function on existing nodes
existing nodes can recursively update fingers of other nodes
N36
N60
N40
N5
N20N99
N80
P2P 80
Joining the Ring - Step 3Joining the Ring - Step 3
Transfer keys from successor node to new node
only keys in the range are transferred
Copy keys 21..36from N40 to N36
K30K38
N36
N60
N40
N5
N20N99
N80
K30
K38
P2P 81
Chord: Handling FailuresChord: Handling Failures
Use successor list
Each node knows r immediate successors
After failure, will know first live successor
Correct successors guarantee correct lookups
Guarantee is with some probability
Can choose r to make probability of lookup failure arbitrarily small
P2P 82
Chord: Evaluation OverviewChord: Evaluation Overview
Quick lookup in large systems
Low variation in lookup costs
Robust despite massive failure
Experiments confirm theoretical results