p2p1 peer-to-peer applications course on computer communication and networks, cth/gu the slides are...

P2P 1

Peer-to-Peer Applications

Course on Computer Communication and Networks, CTH/GU

The slides are adaptation of slides of the authors of the main textbook of the course, of the authors in the bibliography section and of Jeff Pang.

P2P 2

P2P file sharing

Example Alice runs P2P client

application on her notebook computer

Intermittently connects to Internet; gets new IP address for each connection

Asks for “Hey Jude” Application displays

other peers that have copy of Hey Jude.

Alice chooses one of the peers, Bob.

File is copied from Bob’s PC to Alice’s notebook: HTTP

While Alice downloads, other users uploading from Alice.

Alice’s peer is both a Web client and a transient Web server.

All peers are servers = highly scalable!

P2P 3

Intro

Quickly grown in popularity Dozens or hundreds of file sharing applications many million people worldwide use P2P networks Audio/Video transfer now dominates traffic on the

Internet

But what is P2P? Searching or location? Computers “Peering”? Take advantage of resources at the edges

of the network• End-host resources have increased dramatically• Broadband connectivity now common

P2P 4

The Lookup Problem

Internet

N1

N2 N3

N6N5

N4

Publisher

Key=“title”Value=MP3 data… Client

Lookup(“title”)

?

P2P 5

The Lookup Problem (2)

Common Primitives: Join: how to I begin participating? Publish: how do I advertise my file? Search: how to I find a file? Fetch: how to I retrieve a file?

P2P 6

Overview

Centralized Database Napster

Query Flooding Gnutella

Hierarchical Query Flooding KaZaA

Swarming BitTorrent

Unstructured Overlay Routing Freenet

Structured Overlay Routing Distributed Hash Tables

Other

P2P 7

P2P: centralized directoryoriginal “Napster”

design (1999, S. Fanning)

1) when peer connects, it informs central server: IP address content

2) Alice queries directory server for “Hey Jude”

3) Alice requests file from Bob

centralizeddirectory server

peers

Alice

Bob

1

1

1

12

3

Problems? Single point of failure Performance

bottleneck Copyright infringement

P2P 8

Napster: Publish

I have X, Y, and Z!

Publish

insert(X, 123.2.21.23)...

123.2.21.23

P2P 9

Napster: Search

Where is file A?

Query Reply

search(A)-->123.2.0.18Fetch

123.2.0.18

P2P 10

Napster: Discussion

Pros: Simple Search scope is O(1) Controllable (pro or con?)

Cons: Server maintains O(N) State Server does all processing Single point of failure

P2P 11

Overview




Swarming BitTorrent



Other

P2P 12

Gnutella: History

In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella

Soon many other clients: Bearshare, Morpheus, LimeWire, etc.

In 2001, many protocol enhancements including “ultrapeers”

P2P 13

Gnutella: Overview

Query Flooding:Join: on startup, client contacts a few

other nodes; these become its “neighbors”

Publish: no needSearch: ask neighbors, who ask their

neighbors, and so on... when/if found, reply to sender.

Fetch: get the file directly from peer

P2P 14

I have file A.

I have file A.

Gnutella: Search

Where is file A?

Query

Reply

P2P 15

Gnutella: protocol

Query

QueryHit

Query

Query

QueryHit

Query

Query

QueryHit

File transfer:HTTP

Query messagesent over existing TCPconnections peers forwardQuery message QueryHit sent over reversepath

Scalability:limited scopeflooding

P2P 16

Gnutella: More on Peer joining1. Joining peer X must find some other peer in

Gnutella network (from gnutellahosts.com): use list of candidate peers

2. X sequentially attempts to make TCP with peers on list until connection setup with Y

3. X sends Ping message to Y; Y forwards (floods) Ping message.

4. All peers receiving Ping message respond with Pong message

5. X receives many Pong messages. It can then (later on, if needed) setup additional TCP connections

P2P 17

Query flooding: Gnutella

fully distributed no central server

public domain protocol

many Gnutella clients implementing protocol

overlay network: edge between peer X

and Y if there’s a TCP connection

all active peers and edges is overlay net

Edge is not a physical link

Given peer will typically be connected with < 10 overlay neighbors

What is routing in p2p networks?

P2P 18

Gnutella: Discussion

Pros: Fully de-centralized Search cost distributed

Cons: Search scope is O(N) Search time is O(???) Nodes leave often, network unstable

Improvements: limiting depth of search Random walks instead of flooding

P2P 19

Aside: Search Time?

P2P 20

Aside: All Peers Equal?

56kbps Modem

10Mbps LAN

1.5Mbps DSL

56kbps Modem56kbps Modem

1.5Mbps DSL

1.5Mbps DSL

1.5Mbps DSL

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

P2P 21

Overview




Swarming BitTorrent



Other

P2P 22

KaZaA: History

In 2001, KaZaA created by Dutch company Kazaa BV

Single network called FastTrack used by other clients as well: Morpheus, giFT, etc.

Eventually protocol changed so other clients could no longer talk to it

popular file sharing network with >10 million users (number varies)

P2P 23

KaZaA: Overview

“Smart” Query Flooding: Join: on startup, client contacts a “supernode” ...

may at some point become one itself Publish: send list of files to supernode Search: send query to supernode, supernodes flood

query amongst themselves. Fetch: get the file directly from peer(s); can fetch

simultaneously from multiple peers

P2P 24

KaZaA: Network Design

“Super Nodes”

P2P 25

KaZaA: File Insert

I have X!

Publish

insert(X, 123.2.21.23)...

123.2.21.23

P2P 26

KaZaA: File Search

Where is file A?

Query

search(A)-->123.2.0.18

search(A)-->123.2.22.50

Replies

123.2.0.18

123.2.22.50

P2P 27

KaZaA: Discussion

Pros: Tries to take into account node heterogeneity:

• Bandwidth• Host Computational Resources• Host Availability (?)

Rumored to take into account network locality Cons:

Mechanisms easy to circumvent Still no real guarantees on search scope or search time

P2P architecture used by Skype

P2P 28

Overview




Swarming BitTorrent



Other

P2P 29

BitTorrent: History

In 2002, B. Cohen debuted BitTorrent Key Motivation:

Popularity exhibits temporal locality (Flash Crowds) E.g., Slashdot effect, CNN on 9/11, new movie/game

release Focused on Efficient Fetching, not Searching:

Distribute the same file to all peers Single publisher, multiple downloaders

Has some “real” publishers: Blizzard Entertainment using it to distribute games

P2P 30

BitTorrent: Overview

Swarming: Join: contact centralized “tracker” server,

get a list of peers. Publish: Run a tracker server. Search: Out-of-band. E.g., use Google to

find a tracker for the file you want. Fetch: Download chunks of the file from

your peers. Upload chunks you have to them.

2: Application Layer 31

File distribution: BitTorrent

tracker: tracks peers participating in torrent

torrent: group of peers exchanging

chunks of a file

obtain listof peers

trading chunks

peer

P2P file distribution


BitTorrent (1)

file divided into 256KB chunks. peer joining torrent:

has no chunks, but will accumulate them over time

registers with tracker to get list of peers, connects to subset of peers (“neighbors”)

while downloading, peer uploads chunks to other peers.

peers may come and go once peer has entire file, it may (selfishly) leave

or (altruistically) remain


BitTorrent (2)

Pulling Chunks at any given time,

different peers have different subsets of file chunks

periodically, a peer (Alice) asks each neighbor for list of chunks that they have.

Alice sends requests for her missing chunks rarest first

Sending Chunks: tit-for-tat Alice sends chunks to four

neighbors currently sending her chunks at the highest rate

re-evaluate top 4 every 10 secs

every 30 secs: randomly select another peer, starts sending

chunks newly chosen peer may

join top 4 “optimistically unchoke”


BitTorrent: Tit-for-tat(1) Alice “optimistically unchokes” Bob

(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates

(3) Bob becomes one of Alice’s top-four providers

With higher upload rate, can find better trading

partners & get file faster!

P2P 35

BitTorrent: Publish/Join

Tracker

P2P 36

BitTorrent: Fetch

P2P 37

BitTorrent: Sharing Strategy

“Tit-for-tat” sharing strategy “I’ll share with you if you share with me” Be optimistic: occasionally let freeloaders download

• Otherwise no one would ever start!• Also allows you to discover better peers to download

from when they reciprocate

Approximates Pareto Efficiency Game Theory: “No change can make anyone better

off without making others worse off”

P2P 38

BitTorrent: Summary

Pros: Works reasonably well in practice Gives peers incentive to share resources;

avoids freeloaders Cons:

Central tracker server needed to bootstrap swarm


BTW: File Distribution: Server-Client vs P2PQuestion : How much time to distribute file

from one server to N peers?

us

u2d1 d2

u1

uN

dN

Server

Network (with abundant bandwidth)

File, size F

us: server upload bandwidth

ui: peer i upload bandwidth

di: peer i download bandwidth


BTW: File distribution time: server-client

us

u2d1 d2u1

uN

dN

Server


F server

sequentially sends N copies: NF/us time

client i takes F/di

time to download

increases linearly in N(for large N)

= dcs = max { NF/us, F/min(di) }i

Time to distribute F to N clients using

client/server approach


BTW: File distribution time: P2P

us

u2d1 d2u1

uN

dN

Server


F server must send one

copy: F/us time

client i takes F/di time to download

NF bits must be downloaded (aggregate)

fastest possible upload rate: us + ui

dP2P = max { F/us, F/min(di) , NF/(us + ui) }i


0

0.5

1

1.5

2

2.5

3

3.5

0 5 10 15 20 25 30 35

N

Min

imu

m D

istr

ibut

ion

Tim

e P2P

Client-Server

Server-client vs. P2P: example

Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us

P2P 43

Overview




Swarming BitTorrent



Other

P2P 44

Freenet: History

In 1999, I. Clarke started the Freenet project

Basic Idea: Employ shortest-path-like routing on the

overlay network to publish and locate files Addition goals:

Provide anonymity and security Make censorship difficult

P2P 45

Freenet: Overview

Routed Queries: Join: on startup, client contacts a few other nodes it

knows about; gets a unique node id Publish: route file contents toward the file id. File

is stored at node with id closest to file id Search: route query for file id toward the closest

node id (DFS+caching in freenet vs. BFS+nocaching in Gnutella)

Fetch: when query reaches a node containing file id, it returns the file to the sender

P2P 46

Freenet: Routing Tables

id – file identifier (e.g., hash of file) next_hop – another node that stores the file id file – file identified by id being stored on the local

node

Forwarding of query for file id If file id stored locally, then stop

• Forward data back to upstream requestor If not, search for the “closest” id in the table, and

forward the message to the corresponding next_hop

If data is not found, failure is reported back• Requestor then tries next closest match in routing

table

id next_hop file

……

P2P 47

Freenet: Routing

4 n1 f412 n2 f12 5 n3

9 n3 f9

3 n1 f314 n4 f14 5 n3

14 n5 f1413 n2 f13 3 n6

n1 n2

n3

n4

4 n1 f410 n5 f10 8 n6

n5

query(10)

1

2

3

4

4’

5

id next_hop file

……

DFS+caching in freenet vs. BFS+nocaching in Gnutella)

P2P 48

Freenet: Routing Properties

“Close” file ids tend to be stored on the same node Why? Publications of similar file ids route toward the

same place Network tend to be a “small world”

Small number of nodes have large number of neighbors (i.e., ~ “six-degrees of separation”)

Consequence: Most queries only traverse a small number of hops to

find the file (cf. also 80-20 rule)

P2P 49

Freenet: Discussion

Pros: Intelligent routing makes queries relatively short Search scope small (only nodes along search path

involved); no flooding Cons:

Still no provable guarantees!

+/- anonymity Anonymity properties may give you “plausible deniability” Anonymity features make it hard to measure, debug

P2P 50

Overview




Swarming BitTorrent



Other

P2P 51

DHT: History

In 2000-2001, academic researchers said “we want to play too!”

Motivation: Frustrated by popularity of all these “half-baked” P2P

apps :) We can do better! (so we said) Guaranteed lookup success for files in system Provable bounds on search time Provable scalability to millions of node

Hot Topic in networking ever since

P2P 52

MotivationMotivation

How to find data in a distributed file sharing system?

Lookup is a key problem

Internet

PublisherKey=“LetItBe”

Value=MP3 data

Lookup(“LetItBe”)

N1

N2 N3

N5N4Client ?

P2P 53

Centralized SolutionCentralized Solution

Requires O(M) state Single point of failure

Internet


Value=MP3 data


N1

N2 N3

N5N4Client

DB

Central server (Napster)

P2P 54

Distributed Solution (1)Distributed Solution (1)

Worst case O(N) messages per lookup

Internet


Value=MP3 data


N1

N2 N3

N5N4Client

Flooding (Gnutella, Morpheus, etc.)

P2P 55

Distributed Solution (2)Distributed Solution (2) Routed messages (Freenet, Tapestry, Chord, CAN, etc.)

Internet


Value=MP3 data


N1

N2 N3

N5N4Client

Only exact matches

P2P 56

Routing ChallengesRouting Challenges

Define a useful key nearness metric

Keep the hop count small

Keep the routing tables “right size”

Stay robust despite rapid changes in membership

P2P 57

DHT: Overview

Abstraction: a distributed “hash-table” (DHT) data structure:

put(id, item); item = get(id);

Implementation: nodes in system form a distributed data structure

Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ...

P2P 58

DHT: Overview (2)

Structured Overlay Routing: Join: On startup, contact a “bootstrap” node and integrate

yourself into the distributed data structure; get a node id Publish: Route publication for file id toward a close node

id along the data structure Search: Route a query for file id toward a close node id.

Data structure guarantees that query will meet the publication.

Fetch: Two options:• Publication contains actual file => fetch from where query

stops• Publication says “I have file X” => query tells you 128.2.1.3

has X, use IP routing to get X from 128.2.1.3

P2P 59

Chord OverviewChord Overview

Provides peer-to-peer hash lookup service:

Lookup(key) IP address

How does Chord locate a node?

How does Chord maintain routing tables?

How does Chord cope with changes in membership?

P2P 60

Chord IDsChord IDs

m bit identifier space for both keys and nodes

Key identifier = SHA-1(key)

Key=“LetItBe” ID=60SHA-1

IP=“198.10.10.1” ID=123SHA-1

Node identifier = SHA-1(IP address)

Both are uniformly distributed

How to map key IDs to node IDs?

P2P 61

Consistent Hashing [Karger 97]Consistent Hashing [Karger 97]

A key is stored at its successor: node with next higher ID

N32

N90

N123 K20

K5

Circular 7-bitID space

0IP=“198.10.10.1”

K101

K60Key=“LetItBe”

P2P 62

Consistent HashingConsistent Hashing Every node knows of every other node

requires global information Routing tables are large O(N) Lookups are fast O(1)

N32

N90

N123

0

Hash(“LetItBe”) = K60

N10

N55

Where is “LetItBe”?

“N90 has K60”

K60

P2P 63

Chord: Basic LookupChord: Basic Lookup

N32

N90

N123

0

Hash(“LetItBe”) = K60

N10

N55

Where is “LetItBe”?

“N90 has K60”

K60

Every node knows its successor in the ring

requires O(N) time

P2P 64

“Finger Tables”“Finger Tables”

Every node knows m other nodes in the ring

Increase distance exponentially

N8080 + 20

N112

N96

N16

80 + 21

80 + 22

80 + 23

80 + 24

80 + 25 80 + 26

P2P 65

“Finger Tables”“Finger Tables”

Finger i points to successor of n+2i

N120

N8080 + 20

N112

N96

N16

80 + 21

80 + 22

80 + 23

80 + 24

80 + 25 80 + 26

P2P 66

Lookups are FasterLookups are Faster

Lookups take O(Log N) hops

Resembles binary search

N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19

P2P 67

Chord propertiesChord properties

Efficient: O(Log N) messages per lookup

N is the total number of servers

Scalable: O(Log N) state per node

Robust: survives massive changes in membership

Proofs are in paper / tech report

Assuming no malicious participants

P2P 68

DHT: Chord Summary

Routing table size?Log N fingers

Routing time?Each hop expects to 1/2 the distance to the

desired id => expect O(log N) hops.

P2P 69

DHT: Discussion

Pros: Guaranteed Lookup O(log N) per node state and search scope

Cons: No one uses them? (only one file sharing

app) Supporting non-exact match search is hard

P2P 70

Overview




Swarming BitTorrent



Other

P2P 71

Other P2P networks

Content-Addressable Network (CAN) topological routing (k-dimensional grid)

Tapestry, Pastry, DKS: ring organization as Chord, other measure of key closeness, k-ary search instead of binary search paradigm

Viceroy: butterfly-type overlay network .... Sethi@home: for sharing computational

resources... Skype ....


P2P Case study: Skype

inherently P2P: pairs of users communicate.

proprietary application-layer protocol (inferred via reverse engineering)

hierarchical overlay with SNs

Index maps usernames to IP addresses; distributed over SNs

Skype clients (SC)

Supernode (SN)

Skype login server


Peers as relays

Problem when both Alice and Bob are behind “NATs”. NAT prevents an

outside peer from initiating a call to insider peer

Solution: Using Alice’s and Bob’s

SNs, Relay is chosen Each peer initiates

session with relay. Peers can now

communicate through NATs via relay

P2P 74

P2P: Summary

Many different styles of organization (data, routing); pros and cons of each

Lessons learned: Single points of failure are bad Flooding messages to everyone is bad Underlying network topology is important Not all nodes are equal Need incentives to discourage freeloading Privacy and security are important Structure can provide bounds and guarantees

Observe: Application-layer networking (sublayers...)

P2P 75

Bibliography

Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, Hari Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. Proceedings of the ACM SIGCOMM, 2001.

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker. A Scalable Content-Addressable Network. Proceedings of the ACM SIGCOMM, 2001.

M.A. Jovanovic, F.S. Annexstein, and K.A.Berman. Scalability Issues in Large Peer-to-Peer Networks - A Case Study of Gnutella. University of Cincinnati, Laboratory for Networks and Applied Graph Theory, 2001.

Frank Dabek, Emma Brunskill, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan. Building Peer-to-Peer Systems With Chord, a Distributed Lookup Service. Proceedings of the 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII), 2001.

Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability. LLNCS 2009. Springer Verlag 2001.

Some extra notes

P2P 76

P2P 77

Chord: Joining the RingChord: Joining the Ring

Three step process:

Initialize all fingers of new node

Update fingers of existing nodes

Transfer keys from successor to new node

Less aggressive mechanism (lazy finger update):

Initialize only the finger to successor node

Periodically verify immediate successor, predecessor

Periodically refresh finger table entries

P2P 78

Joining the Ring - Step 1Joining the Ring - Step 1

Initialize the new node finger table

Locate any node p in the ring

Ask node p to lookup fingers of new node N36

Return results to new node

N36

1. Lookup(37,38,40,…,100,164)

N60

N40

N5

N20N99

N80

P2P 79


Updating fingers of existing nodes

new node calls update function on existing nodes

existing nodes can recursively update fingers of other nodes

N36

N60

N40

N5

N20N99

N80

P2P 80


Transfer keys from successor node to new node

only keys in the range are transferred

Copy keys 21..36from N40 to N36

K30K38

N36

N60

N40

N5

N20N99

N80

K30

K38

P2P 81

Chord: Handling FailuresChord: Handling Failures

Use successor list

Each node knows r immediate successors

After failure, will know first live successor

Correct successors guarantee correct lookups

Guarantee is with some probability

Can choose r to make probability of lookup failure arbitrarily small

P2P 82

Chord: Evaluation OverviewChord: Evaluation Overview

Quick lookup in large systems

Low variation in lookup costs

Robust despite massive failure

Experiments confirm theoretical results

p2p1 peer-to-peer applications course on computer communication and networks, cth/gu the slides are...

Documents

r file

internet r

http r

connection r

history r

r cons

r alices peer

jude r application