1 peer-peer and application-level networking presented by jon crowcroft based strongly on material...

342
1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class of 2001 for the Umass Comp Sci 791N course..

Upload: marcus-stevenson

Post on 25-Dec-2015

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

1

Peer-peer and Application-level Networking

Presented by Jon CrowcroftBased strongly on material byJim Kurose,Brian Levine,Don Towsley, and the class of 2001 for the Umass

Comp Sci 791N course..

Page 2: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

2

0.Introduction

Background Motivation outline of the tutorial

Page 3: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

3

Peer-peer networking

Page 4: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

4

Peer-peer networking Focus at the application level

Page 5: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

5

Peer-peer networking

Peer-peer applications Napster, Gnutella, Freenet: file sharing ad hoc networks multicast overlays (e.g., video distribution)

Page 6: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

6

Peer-peer networking

Q: What are the new technical challenges? Q: What new services/applications enabled? Q: Is it just “networking at the application-level”?

“There is nothing new under the Sun” (William Shakespeare)

Page 7: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

7

Tutorial Contents

Introduction Client-Server v. P2P Case Studies

Napster Gnutella RON Freenet Publius

Middleware Chord CAN Tapestry JXTA ESM Overcast

Applications Storage Conferencing

Page 8: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

8

Client Server v. Peer to Peer(1) RPC/RMI Synchronous Assymmetric Emphasis on language integration and binding

models (stub IDL/XDR compilers etc) Kerberos style security – access control, crypto

Messages Asynchronous Symmetric Emphasis on service location, content

addressing, application layer routing. Anonymity, high availability, integrity. Harder to get right

Page 9: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

9

Client Server v. Peer to Peer(2) RPCCli_call(args)

Srv_main_loop() { while(true) {

deque(call)switch(call.procid)case 0:

call.ret=proc1(call.args)case 1: call.ret=proc2(call.args)

… … …default:call.ret = exception

}}

Page 10: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

10

Client Server v. Peer to Peer(3)P2PPeer_main_loop(){

while(true) {await(event)switch(event.type) {case timer_expire: do some p2p work()

randomize timersbreak;

case inbound message: handle itrespondbreak;

default: do some book keepingbreak;

}}

Page 11: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

11

Peer to peer systems actually old IP routers are peer to peer. Routers discover topology, and maintain

it Routers are neither client nor server Routers continually chatter to each

other Routers are fault tolerant, inherently Routers are autonomous

Page 12: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

12

Peer to peer systems

Have no distinguished role So no single point of bottleneck or

failure. However, this means they need

distributed algorithms for Service discovery (name, address, route,

metric, etc) Neighbour status tracking Application layer routing (based possibly on

content, interest, etc) Resilience, handing link and node failures Etc etc etc

Page 13: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

13

Ad hoc networks and peer2peer Wireless ad hoc networks have many

similarities to peer to peer systems No a priori knowledge No given infrastructure Have to construct it from “thin air”! Note for later – wireless

Page 14: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

14

Overlays and peer 2 peer systems P2p technology is often used to create

overlays which offer services that could be offered in the IP level

Useful deployment strategy Often economically a way around other

barriers to deployment IP Itself was an overlay (on telephone

core infrastructure) Evolutionary path!!!

Page 15: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

15

Next we look at some case studies Piracy^H^H^H^H^H^content sharing

Napster Gnutella Freenet Publius etc

Page 16: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

16

1. NAPSTER

The most (in)famous Not the first (c.f. probably Eternity, from

Ross Anderson in Cambridge) But instructive for what it gets right,

and Also wrong… Also has a political message…and

economic and legal…etc etc etc

Page 17: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

17

Napster program for sharing files over the Internet a “disruptive” application/technology? history:

5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service

12/99: first lawsuit 3/00: 25% UWisc traffic Napster 2000: est. 60M users 2/01: US Circuit Court of

Appeals: Napster knew users violating copyright laws

7/01: # simultaneous online users:Napster 160K, Gnutella: 40K, Morpheus: 300K

Page 18: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

18

Napster: how does it work

Application-level, client-server protocol over point-to-point TCP

Four steps: Connect to Napster server Upload your list of files (push) to server. Give server keywords to search the full list

with. Select “best” of correct answers. (pings)

Page 19: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

19

Napster

napster.com

users

File list is uploaded

1.

Page 20: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

20

Napster

napster.com

user

Requestand

results

User requests search at server.

2.

Page 21: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

21

Napster

napster.com

user

pings pings

User pings hosts that apparently have data.

Looks for best transfer rate.

3.

Page 22: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

22

Napster

napster.com

user

Retrievesfile

User retrieves file

4.

Page 23: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

23

Napster messagesGeneral Packet Format

[chunksize] [chunkinfo] [data...]

CHUNKSIZE: Intel-endian 16-bit integer size of [data...] in bytes

CHUNKINFO: (hex) Intel-endian 16-bit integer.

00 - login rejected 02 - login requested 03 - login accepted 0D - challenge? (nuprin1715) 2D - added to hotlist 2E - browse error (user isn't online!) 2F - user offline

5B - whois query 5C - whois result 5D - whois: user is offline! 69 - list all channels 6A - channel info 90 - join channel 91 - leave channel …..

Page 24: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

24

Napster: requesting a file SENT to server (after logging in to server) 2A 00 CB 00 username "C:\MP3\REM - Everybody Hurts.mp3" RECEIVED 5D 00 CC 00 username 2965119704 (IP-address backward-form = A.B.C.D) 6699 (port) "C:\MP3\REM - Everybody Hurts.mp3" (song) (32-byte checksum) (line speed) [connect to A.B.C.D:6699] RECEIVED from client 31 00 00 00 00 00 SENT to client GET RECEIVED from client 00 00 00 00 00 00

SENT to client Myusername "C:\MP3\REM - Everybody Hurts.mp3" 0 (port to connect to) RECEIVED from client (size in bytes) SENT to server 00 00 DD 00 (give go-ahead thru server) RECEIVED from client [DATA]

From: http://david.weekly.org/code/napster.php3

Page 25: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

25

Napster: architecture notes

centralized server: single logical point of failure can load balance among servers using DNS

rotation potential for congestion Napster “in control” (freedom is an illusion)

no security: passwords in plain text no authentication no anonymity

Page 26: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

26

2 Gnutella

Napster fixed Open Source Distributed Still very political…

Page 27: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

27

Gnutella

peer-to-peer networking: applications connect to peer applications

focus: decentralized method of searching for files

each application instance serves to: store selected files route queries (file searches) from and to its

neighboring peers respond to queries (serve file) if file stored locally

Gnutella history: 3/14/00: release by AOL, almost immediately

withdrawn too late: 23K users on Gnutella at 8 am this AM many iterations to fix poor initial design (poor design

turned many people off)

What we care about: How much traffic does one query generate? how many hosts can it support at once? What is the latency associated with querying? Is there a bottleneck?

Page 28: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

28

Gnutella: how it worksSearching by flooding: If you don’t have the file you want, query 7 of

your partners. If they don’t have it, they contact 7 of their

partners, for a maximum hop count of 10. Requests are flooded, but there is no tree

structure. No looping but packets may be received twice. Reverse path forwarding(?)

Note: Play gnutella animation at: http://www.limewire.com/index.jsp/p2p

Page 29: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

29

Flooding in Gnutella: loop prevention

Seen already list: “A”

Page 30: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

30

Gnutella message format

Message ID: 16 bytes (yes bytes) FunctionID: 1 byte indicating

00 ping: used to probe gnutella network for hosts 01 pong: used to reply to ping, return # files shared 80 query: search string, and desired minimum bandwidth 81: query hit: indicating matches to 80:query, my IP

address/port, available bandwidth RemainingTTL: decremented at each peer to

prevent TTL-scoped flooding HopsTaken: number of peer visited so far by

this message DataLength: length of data field

Page 31: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

31

Gnutella: initial problems and fixes

Freeloading: WWW sites offering search/retrieval from Gnutella network without providing file sharing or query routing. Block file-serving to browser-based non-file-sharing

users Prematurely terminated downloads:

long download times over modems modem users run gnutella peer only briefly (Napster

problem also!) or any users becomes overloaded fix: peer can reply “I have it, but I am busy. Try again

later” late 2000: only 10% of downloads succeed 2001: more than 25% downloads successful (is this

success or failure?)

www.limewire.com/index.jsp/net_improvements

Page 32: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

32

Gnutella: initial problems and fixes (more)

2000: avg size of reachable network ony 400-800 hosts. Why so smalll? modem users: not enough bandwidth to provide search

routing capabilities: routing black holes Fix: create peer hierarchy based on capabilities

previously: all peers identical, most modem blackholes connection preferencing:

• favors routing to well-connected peers• favors reply to clients that themselves serve large number

of files: prevent freeloading Limewire gateway functions as Napster-like central

server on behalf of other peers (for searching purposes)

www.limewire.com/index.jsp/net_improvements

Page 33: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

33

Anonymous?

Not anymore than it’s scalable. The person you are getting the file from

knows who you are. That’s not anonymous.

Other protocols exist where the owner of the files doesn’t know the requester.

Peer-to-peer anonymity exists. See Eternity and, next, Freenet!

Page 34: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

34

Gnutella Discussion:

Architectural lessons learned? Do Gnutella’s goals seem familiar? Does

it work better than say squid or summary cache? Or multicast with carousel?

anonymity and security? Other? Good source for technical info/open

questions:http://www.limewire.com/index.jsp/

tech_papers

Page 35: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

35

3. Overlays

Next, we need to look at overlays in general, and more specically, at

Routing… RON is a good example…

Page 36: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

36

Resilient Overlay Networks

Overlay network: applications, running at various sites create “logical” links (e.g., TCP or UDP

connections) pairwise between each other

each logical link: multiple physical links, routing defined by native Internet routing

let’s look at an example, taken from: D. Anderson, H. Balakrishnan, F. Kaashoek, R. Morris, "The

case for reslient overlay networks," Proc. HotOS VIII, May 2001, http://nms.lcs.mit.edu/papers/ron-hotos2001.html.

Page 37: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

37

Internet Routing BGP defines routes between stub networks

UCLA Noho.net

Berkeley.netUMass.net

Internet 2

Mediaone

C&W

Page 38: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

38

Internet Routing BGP defines routes between stub networks

UCLA Noho.net

Berkeley.netUMass.net

Internet 2

Mediaone

C&W

Noho-to-UMass

Page 39: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

39

Internet Routing BGP defines routes between stub networks

UCLA Noho.net

Berkeley.netUMass.net

Internet 2

Mediaone

C&W

Noho-to-Berkeley

Page 40: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

40

Internet Routing

Congestion or failure: Noho to Berkely BGP-determined route may not change (or will change slowly)

UCLA Noho.net

Berkeley.net UMass.net

Internet 2

Mediaone

C&W

Noho-to-Berkeley

Page 41: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

41

Internet Routing

Congestion or failure: Noho to Berkely BGP-determined route may not change (or will change slowly)

UCLA Noho.net

Berkeley.net UMass.net

Internet 2

Mediaone

C&W

Noho-to-Berkeley

Noho to UMass to Berkeley

route not visible or available via BGP!

MediaOne can’t route to Berkeley via Internet2

Page 42: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

42

RON: Resilient Overlay Networks

Premise: by building application overlay network, can increase performance, reliability of routing

Two-hop (application-level) noho-to-Berkeley route

application-layer router

Page 43: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

43

RON Experiments

Measure loss, latency, and throughput with and without RON

13 hosts in the US and Europe 3 days of measurements from data

collected in March 2001 30-minute average loss rates

A 30 minute outage is very serious! Note: Experiments done with “No-

Internet2-for-commercial-use” policy

Page 44: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

44

An order-of-magnitude fewer failures

0010100%

001480%

002050%

003230%

15412720%

475747910%

RON Worse

No Change

RON Better

Loss Rate

30-minute average loss rates

6,825 “path hours” represented here12 “path hours” of essentially complete outage76 “path hours” of TCP outage

RON routed around all of these!One indirection hop provides almost all the benefit!

6,825 “path hours” represented here12 “path hours” of essentially complete outage76 “path hours” of TCP outage

RON routed around all of these!One indirection hop provides almost all the benefit!

Page 45: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

45

RON Research Issues

• How to design overlay networks?• Measurement and self-configuration• Understanding performance of underlying

net.• Fast fail-over.• Sophisticated metrics.• application-sensitive (e.g., delay versus

throughput) path selection.• Effect of RON on underlying network

• If everyone does RON, are we better off?

Page 46: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

46

4. Freenet

Page 47: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

47

Centralized model e.g. Napster global index held by central authority direct contact between requestors and

providers Decentralized model

e.g. Freenet, Gnutella, Chord no global index – local knowledge only

(approximate answers) contact mediated by chain of intermediaries

P2P Application

Page 48: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

48

Freenet history

Final Year project Ian Clarke , Edinburgh University, Scotland, June, 1999

Sourceforge Project, most active V.0.1 (released March 2000) Latest version(Sept, 2001): 0.4

Page 49: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

49

What is Freenet and Why?

Distributed, Peer to Peer, file sharing system

Completely anonymous, for producers or consumers of information

Resistance to attempts by third parties to deny access to information

Page 50: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

50

Freenet: How it works

Data structure Key Management Problems

How can one node know about others How can it get data from remote nodes How to add new nodes to Freenet How does freenet mangage its data

Protocol DetailsHeader information

Page 51: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

51

Data structure

Routing Table Pair: node address: ip, tcp; corresponding key value

Data Store Requirement:

• rapidly find the document given a certain key

• rapidly find the closest key to a given key• keep track the popularity of documents

and know which document to delete when under pressure

Page 52: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

52

Key Management(1)

A way to locate a document anywhere Keys are used to form a URI Two similar keys don’t mean the

subjects of the file are similar! Keyword-signed Key(KSK)

Based on a short descriptive string, usually a set of keywords that can describe the document

Example: University/umass/cs/hzhang Uniquely identify a document Potential problem – global namespace

Page 53: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

53

Key Management (2)

Signed-subspace Key(SSK) Add sender information to avoid namespace

conflict Private key to sign/ public key to varify

Content-hash Key(CHK) Message digest algorithm, Basically a hash

of the document

Page 54: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

Darling, Tell me

the truth!

Believe me, I don’t have it.

Freenet: Routing Algorithm: search or insert

Page 55: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

But I know Joe may have it

since I borrowed

similar stuff him last time.

Freenet: Routing Algorithm: search or insert

A

B

C

D

I

Page 56: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

A, Help me!

Sorry, No

Freenet: Routing Algorithm: search or insert

A

B C

D

I

Page 57: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

57

Strength of routing algorithm(1) Replication of Data Clustering (1) (Note: Not subject-clustering but key-

clustering!) Reasonable Redundancy: improve data

availability.

Page 58: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

58

Strength of routing algorithm(2)

New Entry in the Routing Table: the graph will be more and more connected. --- Node discovery

Page 59: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

59

Protocol Details

Header information DataReply

UniqueID=C24300FB7BEA06E3 Depth=a * HopsToLive=2c Source=tcp/127.0.0.1:2386 DataLength=131 Data 'Twas brillig, and the slithy toves

Did gyre and gimble in the wabe: All mimsy were the borogoves And the mome raths outgrabe

Page 60: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

60

Some security and authentication issues How to ensure anonymity:

Nodes can lie randomly about the requests and claim to be the origin or the destination of a request

Hop-To-Live values are fuzzy Then it’s impossible to trace back a

document to its original node Similarly, it’s impossible to discover which

node inserted a given document.

Page 61: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

61

Network convergence

X-axis: time Y-axis: # of pathlength 1000 Nodes, 50 items

datastore, 250 entries routing table

the routing tables were initialized to ring-lattice topology

Pathlength: the number of hops actually taken before finding the data.

Page 62: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

62

Scalability

X-axis: # of nodes Y-axis: # of pathlength The relation between

network size and average pathlenth.

Initially, 20 nodes. Add nodes regularly.

Page 63: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

63

Fault Tolerance

X-axis: # of nodes failing Y-axis: # of pathlength The median pathlength

remains below 20 even when up to 30% nodes fails.

Page 64: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

64

Small world Model

X-axis: # of nodes failing Y-axis: # of pathlength Most of nodes have only few

connections while a small number of news have large set of connections.

The authors claim it follows power law.

Page 65: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

65

So far, it’s really a good model Keep anonymity Distributed model; data available Converge fast Adaptive

Page 66: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

66

Is it Perfect?

How long will it take to search or insert? Trade off between anonymity and searching efforts:

Chord vs Freenet Can we come up a better algorithm? A good try:

“Search in Power-Law Networks” Have no idea about if search fails due to no

such document or just didn’t find it. File lifetime. Freenet doesn’t guarantee a

document you submit today will exist tomorrow!!

Page 67: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

67

Question??

Anonymity? Security? Better search algorithm? Power law? …

Page 68: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

68

5. Publius: A robust, tamper-evident, censorship-resistant web publishing system

Marc Waldman Aviel Rubin Lorrie Faith Cranor

Page 69: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

69

Outline

Design Goals

Kinds of Anonymity

Publius Features

Publius Limitations and Threats

Questions

Page 70: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

70

Design Goals

Censorship resistant Difficult for a third party to modify or delete

content Tamper evident

Unauthorized changes should be detectable Source anonymous

No way to tell who published the content Updateable

Changes to or deletion of content should be possible for publishers

Page 71: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

71

Design Goals

Deniable Involved third parties should be able to

deny knowledge of what is published Fault Tolerant

System remains functional, even if some third parties are faulty or malicious

Persistent No expiration date on published materials

Page 72: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

72

Web Anonymity

Connection Based Hides the identity of the individual

requesting a page Examples:• Anonymizing proxies, such as The Anonymizer or

Proxymate• Proxies utilizing Onion Routing, such as Freedom• Crowds, where users in the Crowd probabilistically

route or retrieve for other users in the Crowd

Page 73: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

73

Web Anonymity

Author Based Hides the location or author of a particular

document Examples:• Rewebber, which proxies requests for encrypted

URLs• The Eternity Service, which for a fee inserts a

document into a random subset of servers, and guarantees its future existence

• Freenet Publius provides this sort of anonymity

Page 74: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

74

Publius System Overview

Publishers Post Publius content to the web

Servers A static set which host random-looking

content

Retrievers Browse Publius content on web

Page 75: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

75

Publius System Overview

Publish A publisher posts content across multiple

servers in a source anonymous fashion Retrieve

A retriever gets content from multiple servers

Delete The original publisher of a document

removes it from the Publius servers Update

The original publisher modifies a document

Page 76: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

76

Publius Publishing

Alice generates a random symmetric key K

She encrypts message M with key K, producing {M}K

She splits K into n shares, using Shamir secret sharing, such that any k can reproduce K

Each share is uniquely named:namei = wrap(H(M . sharei))

Page 77: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

77

Publius Publishing

A set of locations is chosen:locationi = (namei MOD m) + 1

Each locationi indexes into the list of m servers

If d >= k unique values are not obtained, start over

Alice publishes {M}K and sharei into a directory namei on the server at locationi

A URL containing at least the d namei values is produced

Page 78: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

78

Publius Publishing

...name = de26fe4fc8c6

name = 620a8a3d63b

name = 1e0995d66981

2

n

...

135.207.8.15

121.113.8.5

1

2

m 206.35.113.9

105.3.14.1

...

...

...

3

4

7

12

201.18.24.5

210.183.28.4

209.185.143.19

location = 7

location = 12

loction = 41

2

n

Publisher

Servers

/publius/1e0995d6698/{M}K

Server 3

Server 8

/publius/de26fe4fc8c6/{M}K/publius/620a8a3d63b/{M}K

Server 4

Server 12Server 7

201.18.24.5

209.185.143.19210.183.28.4

Server TableAvailable

Page 79: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

79

Publius Retrieval

Bob parses out each namei from URL, and for each, computes:

locationi = (namei MOD m) + 1 Bob chooses k of these, and retrieves the

encrypted file {M}K and sharei at each server Bob combines the shares to get K, and

decrypts the file Bob verifies that each name value is correct:

namei = wrap(H(M . sharei))

Page 80: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

80

Publius Delete

Alice generates a password PW when publishing a file

Alice includes H(server_domain_name . PW) in server directory when publishing Note that each server has its own hash, to

prevent a malicious server operator from deleting content on all servers

Alice deletes by sending H(server_domain_name . PW) and namei to each of the n servers hosting content

Page 81: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

81

Publius Update

Idea: change content without changing original URL, as links to that URL may exist

In addition to the file, the share, and the password, there may be an update file in the namei directory

This update file will not exist if Alice has not updated the content

Page 82: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

82

Publius Update

To update, Alice specifies a new file, the original URL, the original password PW, and a new password

First, the new content is published, and a new URL is generated

Then, each of the n old files is deleted, and an update file, containing the new URL, is placed in each namei directory

Page 83: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

83

Publius Update

When Bob retrieves updated content, the server returns the update file instead

Bob checks that all of the URLs are identical, then retrieves the content at the new URL

Page 84: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

84

Linking Documents

Simple case: file A links to file B Solution: Publish B first, then rewrite URLs in

A

Harder: files C and D link to each other Cannot use simple solution above Alice publishes C and D in any order She then rewrites the URLs in each file, and

uses the Publius Update procedure on the new files

Page 85: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

85

Other Features

Entire directories can be published by exploiting the updateability of Publius

Mechanism exists to encode MIME type into Publius content

Publius URLs include option fields and other flags, the value of k, and other relevant values Older broswers preclude URLs of length

>255 characters Once this limitation is removed, URLs can

include server list, making this list non-static

Page 86: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

86

Limitations and Threats

Share deletion or corruption If all n copies of a file, or n-k+1 copies of

the shares, are deleted, then the file is unreadable

Increasing n, or decreasing k, makes this attack harder

Page 87: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

87

Limitations and Threats

Update file deletion or corruption 1 If there is no update file, malicious server

operator Mallory could create one, pointing to bad content

This requires the assistance of at least k other server operator, and motivates a higher value of k

The Publius URL has several fields, among them a no_update flag, which will prevent this sort of attack

Page 88: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

88

Limitations and Threats

Update file deletion or corruption 2 If Publius content has already been updated,

Mallory must corrupt update files on n-k+1 servers

Of course, if Mallory can do this, she can censor any document

Larger n and smaller k make this more difficult

Deciding upon good values for n and k is difficult No suggestions from Waldman et al.

Page 89: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

89

Limitations and Threats

Publius, like all internet services, is subject to DoS attacks Flooding is less effective, as n-k+1 servers

must be attacked A malicious user could attempt to fill disk

space on servers• Some mechanisms in place to prevent this

Page 90: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

90

Limitations and Threats

If the Publius content contains any identifying information, anonymity will be lost

Publius does not provide any connection based anonymity If you act as a publisher, you must

anonymize your connections with the Publius servers

Page 91: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

91

Questions

How do you publish Publius URLs anonymously? Freenet keys can be guessed at, but Publius

URLs are entirely machine generated The first person to publish a Publius URL

must have some connection with the publisher of the content

If you have somewhere secure and anonymous to publish the Publius URLs, why do you need Publius?• One possible answer: censorship resistance• But server operators are then potentially liable

Page 92: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

92

Questions

How deniable is Publius? Publius URLs are public With minimal effort, a Publius server

operator could determine the content being served

Page 93: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

93

Questions

How does Publius compare to Freenet? Both provide publisher anonymity,

deniability, and censorship resistance Freenet provides anonymity for retrievers

and servers, as well• Cost is high: data must be cached at many nodes

Publius provides persistence of data• Freenet does not• Can any p2p system provide persistence?

Page 94: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

94

Questions

Could Publius be made into a p2p service?

Would it be appropriate to do so?

Page 95: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

95

Ion Stoica, Robert Morris, David Karger,M. Frans Kaashoek, Hari Balakrishnan

MIT and Berkeley

Now we see some CS in strength – Hash and Content based….for more scaleble (distributed) directory lookup

6. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

6. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

presentation based on slides by Robert Morris (SIGCOMM’01)

Page 96: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

96

OutlineOutline

Motivation and background

Consistency caching

Chord

Performance evaluation

Conclusion and discussion

Page 97: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

97

MotivationMotivation

How to find data in a distributed file sharing system?

Lookup is the key problem

Internet

PublisherKey=“LetItBe”

Value=MP3 data

Lookup(“LetItBe”)

N1

N2 N3

N5N4Client ?

Page 98: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

98

Centralized SolutionCentralized Solution

Requires O(M) state Single point of failure

Internet

PublisherKey=“LetItBe”

Value=MP3 data

Lookup(“LetItBe”)

N1

N2 N3

N5N4Client

DB

Central server (Napster)

Page 99: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

99

Distributed Solution (1)Distributed Solution (1)

Worst case O(N) messages per lookup

Internet

PublisherKey=“LetItBe”

Value=MP3 data

Lookup(“LetItBe”)

N1

N2 N3

N5N4Client

Flooding (Gnutella, Morpheus, etc.)

Page 100: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

100

Distributed Solution (2)Distributed Solution (2) Routed messages (Freenet, Tapestry, Chord, CAN,

etc.)

Internet

PublisherKey=“LetItBe”

Value=MP3 data

Lookup(“LetItBe”)

N1

N2 N3

N5N4Client

Only exact matches

Page 101: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

101

Routing ChallengesRouting Challenges

Define a useful key nearness metric

Keep the hop count small

Keep the routing tables “right size”

Stay robust despite rapid changes in membership

Authors claim: Chord: emphasizes efficiency and

simplicity

Page 102: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

102

Chord OverviewChord Overview

Provides peer-to-peer hash lookup service: Lookup(key) IP address

Chord does not store the data

How does Chord locate a node?

How does Chord maintain routing tables?

How does Chord cope with changes in membership?

Page 103: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

103

Chord propertiesChord properties

Efficient: O(Log N) messages per lookup

N is the total number of servers

Scalable: O(Log N) state per node

Robust: survives massive changes in membership

Proofs are in paper / tech report

Assuming no malicious participants

Page 104: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

104

Chord IDsChord IDs

m bit identifier space for both keys and nodes

Key identifier = SHA-1(key)

Key=“LetItBe” ID=60SHA-1

IP=“198.10.10.1” ID=123SHA-1

Node identifier = SHA-1(IP address)

Both are uniformly distributed

How to map key IDs to node IDs?

Page 105: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

105

Consistent Hashing [Karger 97]Consistent Hashing [Karger 97]

A key is stored at its successor: node with next higher ID

N32

N90

N123 K20

K5

Circular 7-bitID space

0IP=“198.10.10.1”

K101

K60Key=“LetItBe”

Page 106: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

106

Consistent HashingConsistent Hashing Every node knows of every other node

requires global information Routing tables are large O(N) Lookups are fast O(1)

N32

N90

N123

0

Hash(“LetItBe”) = K60

N10

N55

Where is “LetItBe”?

“N90 has K60”

K60

Page 107: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

107

Chord: Basic LookupChord: Basic Lookup

N32

N90

N123

0

Hash(“LetItBe”) = K60

N10

N55

Where is “LetItBe”?

“N90 has K60”

K60

Every node knows its successor in the ring

requires O(N) time

Page 108: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

108

“Finger Tables”“Finger Tables”

Every node knows m other nodes in the ring

Increase distance exponentially

N8080 + 20

N112

N96

N16

80 + 21

80 + 22

80 + 23

80 + 24

80 + 25 80 + 26

Page 109: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

109

“Finger Tables”“Finger Tables”

Finger i points to successor of n+2i

N120

N8080 + 20

N112

N96

N16

80 + 21

80 + 22

80 + 23

80 + 24

80 + 25 80 + 26

Page 110: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

110

Lookups are FasterLookups are Faster

Lookups take O(Log N) hops

N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19

Page 111: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

111

Joining the RingJoining the Ring

Three step process: Initialize all fingers of new node

Update fingers of existing nodes

Transfer keys from successor to new node

Less aggressive mechanism (lazy finger update): Initialize only the finger to successor node

Periodically verify immediate successor, predecessor

Periodically refresh finger table entries

Page 112: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

112

Joining the Ring - Step 1Joining the Ring - Step 1

Initialize the new node finger table

Locate any node p in the ring

Ask node p to lookup fingers of new node N36

Return results to new node

N36

1. Lookup(37,38,40,…,100,164)

N60

N40

N5

N20N99

N80

Page 113: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

113

Joining the Ring - Step 2Joining the Ring - Step 2

Updating fingers of existing nodes new node calls update function on existing nodes

existing nodes can recursively update fingers of other nodes

N36

N60

N40

N5

N20N99

N80

Page 114: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

114

Joining the Ring - Step 3Joining the Ring - Step 3

Transfer keys from successor node to new node only keys in the range are transferred

Copy keys 21..36from N40 to N36

K30K38

N36

N60

N40

N5

N20N99

N80

K30

K38

Page 115: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

115

Handing FailuresHanding Failures

Failure of nodes might cause incorrect lookup

N120

N113

N102

N80

N85

N10

Lookup(90)

N80 doesn’t know correct successor, so lookup fails

Successor fingers are enough for correctness

Page 116: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

116

Handling FailuresHandling Failures

Use successor list Each node knows r immediate successors

After failure, will know first live successor

Correct successors guarantee correct lookups

Guarantee is with some probability

Can choose r to make probability of lookup failure arbitrarily small

Page 117: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

117

Evaluation OverviewEvaluation Overview

Quick lookup in large systems

Low variation in lookup costs

Robust despite massive failure

Experiments confirm theoretical results

Page 118: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

118

Cost of lookupCost of lookup

Cost is O(Log N) as predicted by theory constant is 1/2

Number of Nodes

Avera

ge M

ess

ag

es

per

Looku

p

Page 119: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

119

RobustnessRobustness

Simulation results: static scenario

Failed lookup means original node with key failed (no replica of keys)

Result implies good balance of keys among nodes!

Page 120: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

120

RobustnessRobustness

Simulation results: dynamic scenario

Failed lookup means finger path has a failed node

500 nodes initially

average stabilize( ) call 30s

1 lookup per second (Poisson)

x join/fail per second (Poisson)

Page 121: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

121

Current implementationCurrent implementation

Chord library: 3,000 lines of C++

Deployed in small Internet testbed

Includes:

Correct concurrent join/fail

Proximity-based routing for low delay (?)

Load control for heterogeneous nodes (?)

Resistance to spoofed node IDs (?)

Page 122: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

122

StrengthsStrengths

Based on theoretical work (consistent hashing)

Proven performance in many different aspects “with high probability” proofs

Robust (Is it?)

Page 123: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

123

WeaknessWeakness

NOT that simple (compared to CAN)

Member joining is complicated aggressive mechanisms requires too many messages and

updates

no analysis of convergence in lazy finger mechanism

Key management mechanism mixed between layers upper layer does insertion and handle node failures

Chord transfer keys when node joins (no leave mechanism!)

Routing table grows with # of members in group

Worst case lookup can be slow

Page 124: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

124

DiscussionsDiscussions

Network proximity (consider latency?)

Protocol security Malicious data insertion

Malicious Chord table information

Keyword search and indexing

...

Page 125: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

125

7. Tapestry: Decentralized Routing and Location

Ben Y. ZhaoCS Division, U. C. Berkeley

Page 126: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

126

Outline

Problems facing wide-area applications

Tapestry Overview

Mechanisms and protocols

Preliminary Evaluation

Related and future work

Page 127: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

127

Motivation

Shared Storage systems need an data location/routing mechanism Finding the peer in a scalable way is a difficult

problem Efficient insertion and retrieval of content in a large

distributed storage infrastructure Existing solutions

Centralized: expensive to scale, less fault tolerant,vulnerable to DoS attacks (e.g. Napster,DNS,SDS)

Flooding: not scalable (e.g. Gnutella)

Page 128: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

128

Key: Location and Routing

Hard problem: Locating and messaging to resources and data

Approach: wide-area overlay infrastructure: Scalable, Dynamic, Fault-tolerant, Load

balancing

Page 129: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

129

Decentralized Hierarchies

Centralized hierarchies Each higher level node responsible for

locating objects in a greater domain Decentralize: Create a tree for object O

(really!) Object O has its

own root andsubtree

Server on each levelkeeps pointer tonearest object in domain

Queries search up inhierarchy

Root ID = O

Directory servers tracking 2 replicas

Page 130: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

130

What is Tapestry?

A prototype of a decentralized, scalable, fault-tolerant, adaptive location and routing infrastructure(Zhao, Kubiatowicz, Joseph et al. U.C. Berkeley)

Network layer of OceanStore global storage systemSuffix-based hypercube routing Core system inspired by Plaxton Algorithm

(Plaxton, Rajamaran, Richa (SPAA97)) Core API:

publishObject(ObjectID, [serverID]) sendmsgToObject(ObjectID) sendmsgToNode(NodeID)

Page 131: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

131

Incremental Suffix Routing

Namespace (nodes and objects) large enough to avoid collisions (~2160?)

(size N in Log2(N) bits)

Insert Object: Hash Object into namespace to get ObjectID For (i=0, i<Log2(N), i+j) { //Define hierarchy

• j is base of digit size used, (j = 4 hex digits) • Insert entry into nearest node that matches on

last i bits• When no matches found, then pick node matching

(i – n) bits with highest ID value, terminate

Page 132: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

132

Routing to Object

Lookup object Traverse same relative nodes as insert, except

searching for entry at each node For (i=0, i<Log2(N), i+n)

Search for entry in nearest node matching on last i bits

Each object maps to hierarchy defined by single root f (ObjectID) = RootID

Publish / search both route incrementally to root

Root node = f (O), is responsible for “knowing” object’s location

Page 133: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

133

Object LocationRandomization and Locality

Page 134: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

134

4

2

3

3

3

2

2

1

2

4

1

2

3

3

1

34

1

1

4 3

2

4

Tapestry MeshIncremental suffix-based routing

NodeID0x43FE

NodeID0x13FENodeID

0xABFE

NodeID0x1290

NodeID0x239E

NodeID0x73FE

NodeID0x423E

NodeID0x79FE

NodeID0x23FE

NodeID0x73FF

NodeID0x555E

NodeID0x035E

NodeID0x44FE

NodeID0x9990

NodeID0xF990

NodeID0x993E

NodeID0x04FE

NodeID0x43FE

Page 135: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

135

Contribution of this work

Plaxtor Algorithm Limitations

Global knowledge algorithms

Root node vulnerability

Lack of adaptability

Tapestry Distributed

algorithms• Dynamic node

insertion• Dynamic root

mapping Redundancy in

location and routing Fault-tolerance

protocols Self-configuring /

adaptive Support for mobile

objects Application

Infrastructure

Page 136: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

136

Dynamic Insertion Example

NodeID0x243FE

NodeID0x913FENodeID

0x0ABFE

NodeID0x71290

NodeID0x5239E

NodeID0x973FE

NEW0x143FE

NodeID0x779FE

NodeID0xA23FE

Gateway0xD73FF

NodeID0xB555E

NodeID0xC035E

NodeID0x244FE

NodeID0x09990

NodeID0x4F990

NodeID0x6993E

NodeID0x704FE

4

2

3

3

3

2

1

2

4

1

2

3

3

1

34

1

1

4 3

2

4

NodeID0x243FE

Page 137: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

137

Fault-tolerant Location

Minimized soft-state vs. explicit fault-recovery Multiple roots

Objects hashed w/ small salts multiple names/roots Queries and publishing utilize all roots in parallel P(finding Reference w/ partition) = 1 – (1/2)n

where n = # of roots Soft-state periodic republish

50 million files/node, daily republish, b = 16, N = 2160 , 40B/msg, worst case update traffic: 156 kb/s,

expected traffic w/ 240 real nodes: 39 kb/s

Page 138: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

138

Fault-tolerant Routing

Detection: Periodic probe packets between neighbors

Handling: Each entry in routing map has 2 alternate

nodes Second chance algorithm for intermittent

failures Long term failures alternates found via

routing tables Protocols:

First Reachable Link Selection Proactive Duplicate Packet Routing

Page 139: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

139

Simulation Environment

Implemented Tapestry routing as packet-level simulator

Delay is measured in terms of network hops

Do not model the effects of cross traffic or queuing delays

Four topologies: AS, MBone, GT-ITM, TIERS

Page 140: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

140

Results: Location Locality

Measuring effectiveness of locality pointers (TIERS 5000)

RDP vs Object Distance (TI5000)

0

2

4

6

8

10

12

14

16

18

20

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Object Distance

RD

P

Locality Pointers No Pointers

Page 141: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

141

Results: Stability via Redundancy

Parallel queries on multiple roots. Aggregate bandwidth measures b/w used for soft-state republish 1/day and b/w used by requests at rate of 1/s.

Retrieving Objects with Multiple Roots

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5

# of Roots Utilized

La

ten

cy

(H

op

Un

its

)

0

10

20

30

40

50

60

70

Ag

gre

ga

te B

an

dw

idth

(k

b/s

)

Average Latency Aggregate Bandwidth per Object

Page 142: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

142

Related Work

Content Addressable Networks Ratnasamy et al.,

(ACIRI / UCB) Chord

Stoica, Morris, Karger, Kaashoek, Balakrishnan (MIT / UCB)

Pastry Druschel and Rowstron

(Rice / Microsoft Research)

Page 143: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

143

Strong Points

Designed system based on Theoretically proven idea (Plaxton Algorithm)

Fully decentralized and scalable solution for deterministic location and routing problem

Page 144: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

144

Weaknesses/Improvements

Substantially complicated Esp, dynamic node insertion algorithm is

non-trivial, and each insertion will take a non-negligible amount of time.

Attempts to insert a lot of nodes at the same time

Where to put “root” node for a given object Needs universal hashing function Possible to put “root” to near expected

clients dynamically?

Page 145: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

145

Routing to NodesExample: Octal digits, 218 namespace, 005712 627510

005712

340880 943210

387510

834510

727510

627510

Neighbor MapFor “5712” (Octal)

Routing Levels1234

xxx1

5712

xxx0

xxx3

xxx4

xxx5

xxx6

xxx7

xx02

5712

xx22

xx32

xx42

xx52

xx62

xx72

x012

x112

x212

x312

x412

x512

x612

5712

0712

1712

2712

3712

4712

5712

6712

7712

005712 0 1 2 3 4 5 6 7

340880 0 1 2 3 4 5 6 7

943210 0 1 2 3 4 5 6 7

834510 0 1 2 3 4 5 6 7

387510 0 1 2 3 4 5 6 7

727510 0 1 2 3 4 5 6 7

627510 0 1 2 3 4 5 6 7

Page 146: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

146

Dynamic Insertion

Operations necessary for N to become fully integrated:

Step 1: Build up N’s routing maps Send messages to each hop along path from

gateway to current node N’ that best approximates N The ith hop along the path sends its ith level route

table to N N optimizes those tables where necessary

Step 2: Send notify message via acked multicast to nodes with null entries for N’s ID, setup forwarding ptrs

Step 3: Each notified node issues republish message for relevant objects

Step 4: Remove forward ptrs after one republish period

Step 5: Notify local neighbors to modify paths to route through N where appropriate

Page 147: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

147

Dynamic Root Mapping

Problem: choosing a root node for every object Deterministic over network changes Globally consistent

Assumptions All nodes with same matching suffix

contains same null/non-null pattern in next level of routing map

Requires: consistent knowledge of nodes across network

Page 148: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

148

Plaxton Solution

Given desired ID N, Find set S of nodes in existing network

nodes n matching most # of suffix digits with N

Choose Si = node in S with highest valued ID

Issues: Mapping must be generated statically using

global knowledge Must be kept as hard state in order to

operate in changing environment Mapping is not well distributed, many nodes

in n get no mappings

Page 149: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

149

Sylvia Ratnasamy, Paul Francis, Mark Handley,

Richard Karp, Scott Shenker

8. A Scalable, Content-Addressable Network

ACIRI U.C.Berkeley Tahoe Networks

1 2 3

1,2 3 1

1,2 1

Page 150: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

150

Outline

Introduction Design Evaluation Strengths & Weaknesses Ongoing Work

Page 151: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

151

Internet-scale hash tables

Hash tables essential building block in software systems

Internet-scale distributed hash tables equally valuable to large-scale distributed systems?

Page 152: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

152

Hash tables essential building block in software systems

Internet-scale distributed hash tables equally valuable to large-scale distributed systems?

• peer-to-peer systems– Napster, Gnutella,, FreeNet, MojoNation…

• large-scale storage management systems– Publius, OceanStore,, CFS ...

• mirroring on the Web

Internet-scale hash tables

Page 153: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

153

Content-Addressable Network(CAN)

CAN: Internet-scale hash table

Interface insert(key,value) value = retrieve(key)

Page 154: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

154

Content-Addressable Network(CAN)

CAN: Internet-scale hash table

Interface insert(key,value) value = retrieve(key)

Properties scalable operationally simple good performance (w/ improvement)

Page 155: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

155

Content-Addressable Network(CAN)

CAN: Internet-scale hash table

Interface insert(key,value) value = retrieve(key)

Properties scalable operationally simple good performance

Related systems: Chord/Pastry/Tapestry/Buzz/Plaxton ...

Page 156: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

156

Problem Scope

Design a system that provides the interface scalability robustness performance security

Application-specific, higher level primitives keyword searching mutable content anonymity

Page 157: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

157

Outline

Introduction Design Evaluation Strengths & Weaknesses Ongoing Work

Page 158: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

158

K V

CAN: basic idea

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

Page 159: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

159

CAN: basic idea

insert(K1,V1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

Page 160: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

160

CAN: basic idea

insert(K1,V1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

Page 161: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

161

CAN: basic idea

(K1,V1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

Page 162: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

162

CAN: basic idea

retrieve (K1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

Page 163: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

163

CAN: solution

virtual Cartesian coordinate space

entire space is partitioned amongst all the nodes every node “owns” a zone in the overall space

abstraction can store data at “points” in the space can route from one “point” to another

point = node that owns the enclosing zone

Page 164: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

164

CAN: simple example

1

Page 165: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

165

CAN: simple example

1 2

Page 166: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

166

CAN: simple example

1

2

3

Page 167: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

167

CAN: simple example

1

2

3

4

Page 168: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

168

CAN: simple example

Page 169: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

169

CAN: simple example

I

Page 170: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

170

CAN: simple example

node I::insert(K,V)

I

Page 171: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

171

(1) a = hx(K)

CAN: simple example

x = a

node I::insert(K,V)

I

Page 172: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

172

(1) a = hx(K)

b = hy(K)

CAN: simple example

x = a

y = b

node I::insert(K,V)

I

Page 173: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

173

(1) a = hx(K)

b = hy(K)

CAN: simple example

(2) route(K,V) -> (a,b)

node I::insert(K,V)

I

Page 174: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

174

CAN: simple example

(2) route(K,V) -> (a,b)

(3) (a,b) stores (K,V)

(K,V)

node I::insert(K,V)

I(1) a = hx(K)

b = hy(K)

Page 175: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

175

CAN: simple example

(2) route “retrieve(K)” to (a,b)

(K,V)

(1) a = hx(K)

b = hy(K)

node J::retrieve(K)

J

Page 176: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

176

Data stored in the CAN is addressed by name (i.e. key), not location (i.e. IP address)

CAN

Page 177: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

177

CAN: routing table

Page 178: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

178

CAN: routing

(a,b)

(x,y)

Page 179: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

179

A node only maintains state for its immediate neighboring nodes

CAN: routing

Page 180: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

181

CAN: node insertion

I

new node1) discover some node “I” already in CAN

Page 181: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

182

CAN: node insertion

2) pick random point in space

I

(p,q)

new node

Page 182: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

183

CAN: node insertion

(p,q)

3) I routes to (p,q), discovers node J

I

J

new node

Page 183: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

184

CAN: node insertion

newJ

4) split J’s zone in half… new owns one half

Page 184: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

185

Inserting a new node affects only a single other node and its immediate neighbors

CAN: node insertion

Page 185: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

186

CAN: node failures

Need to repair the space

recover database (weak point)• soft-state updates• use replication, rebuild database from replicas

repair routing • takeover algorithm

Page 186: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

187

CAN: takeover algorithm

Simple failures know your neighbor’s neighbors when a node fails, one of its neighbors takes over its

zone

More complex failure modes simultaneous failure of multiple adjacent nodes scoped flooding to discover neighbors hopefully, a rare event

Page 187: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

188

Only the failed node’s immediate neighbors are required for recovery

CAN: node failures

Page 188: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

189

Design recap

Basic CAN completely distributed self-organizing nodes only maintain state for their immediate

neighbors

Additional design features multiple, independent spaces (realities) background load balancing algorithm simple heuristics to improve performance

Page 189: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

190

Outline

Introduction Design Evaluation Strengths & Weaknesses Ongoing Work

Page 190: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

191

Evaluation

Scalability

Low-latency

Load balancing

Robustness

Page 191: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

192

CAN: scalability

For a uniformly partitioned space with n nodes and d dimensions per node, number of neighbors is 2d average routing path is (dn1/d)/4 hops simulations show that the above results hold in practice

Can scale the network without increasing per-node state

Chord/Plaxton/Tapestry/Buzz log(n) nbrs with log(n) hops

Page 192: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

193

CAN: low-latency

Problem latency stretch = (CAN routing delay)

(IP routing delay) application-level routing may lead to high

stretch

Solution increase dimensions, realities (reduce the path

length) Heuristics (reduce the per-CAN-hop latency)

• RTT-weighted routing• multiple nodes per zone (peer nodes)• deterministically replicate entries

Page 193: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

194

CAN: low-latency

#nodes

Late

ncy

str

etc

h

0

20

40

60

80

100

120

140

160

180

16K 32K 65K 131K

#dimensions = 2

w/o heuristics

w/ heuristics

Page 194: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

195

0

2

4

6

8

10

CAN: low-latency

#nodes

Late

ncy

str

etc

h

16K 32K 65K 131K

#dimensions = 10

w/o heuristics

w/ heuristics

Page 195: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

196

CAN: load balancing

Two pieces

Dealing with hot-spots• popular (key,value) pairs• nodes cache recently requested entries• overloaded node replicates popular entries at neighbors

Uniform coordinate space partitioning• uniformly spread (key,value) entries• uniformly spread out routing load

Page 196: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

197

Uniform Partitioning

Added check at join time, pick a zone check neighboring zones pick the largest zone and split that one

Page 197: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

198

0

20

40

60

80

100

Uniform Partitioning

V 2V 4V 8V

Volume

Perce

nta

ge

of n

od

es

w/o check

w/ check

V = total volumen

V16

V 8

V 4

V 2

65,000 nodes, 3 dimensions

Page 198: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

199

CAN: Robustness

Completely distributed no single point of failure ( not applicable to pieces

of database when node failure happens)

Not exploring database recovery (in case there are multiple copies of database)

Resilience of routing can route around trouble

Page 199: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

200

Outline

Introduction Design Evaluation Strengths & Weaknesses Ongoing Work

Page 200: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

201

Strengths

More resilient than flooding broadcast networks

Efficient at locating information Fault tolerant routing Node & Data High Availability (w/

improvement) Manageable routing table size &

network traffic

Page 201: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

202

Weaknesses

Impossible to perform a fuzzy search Susceptible to malicious activity Maintain coherence of all the indexed

data (Network overhead, Efficient distribution)

Still relatively higher routing latency Poor performance w/o improvement

Page 202: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

203

Suggestions

Catalog and Meta indexes to perform search function

Extension to handle mutable content efficiently for web-hosting

Security mechanism to defense against attacks

Page 203: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

204

Outline

Introduction Design Evaluation Strengths & Weaknesses Ongoing Work

Page 204: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

205

Ongoing Work

Topologically-sensitive CAN construction distributed binning

Page 205: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

206

Distributed Binning

Goal bin nodes such that co-located nodes land in same bin

Idea well known set of landmark machines each CAN node, measures its RTT to each landmark orders the landmarks in order of increasing RTT

CAN construction place nodes from the same bin close together on the CAN

Page 206: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

207

Distributed Binning

4 Landmarks (placed at 5 hops away from each other) naïve partitioning

number of nodes

256 1K 4K

late

ncy

Str

etc

h

5

10

15

20

256 1K 4K

w/o binning w/ binning

w/o binning w/ binning

#dimensions=2 #dimensions=4

Page 207: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

208

Ongoing Work (cont’d)

Topologically-sensitive CAN construction distributed binning

CAN Security (Petros Maniatis - Stanford) spectrum of attacks appropriate counter-measures

Page 208: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

209

Ongoing Work (cont’d)

CAN Usage

Application-level Multicast (NGC 2001)

Grass-Roots Content Distribution

Distributed Databases using CANs(J.Hellerstein, S.Ratnasamy, S.Shenker, I.Stoica, S.Zhuang)

Page 209: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

210

Summary

CAN an Internet-scale hash table potential building block in Internet applications

Scalability O(d) per-node state

Low-latency routing simple heuristics help a lot

Robust decentralized, can route around trouble

Page 210: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

211

9.Sun’s Project JXTA

Technical Overview

Page 211: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

212

Presentation Outline Introduction - what is JXTA Goal - what JXTA wants to be Technology - what JXTA relies upon Structure - how JXTA is built Protocols - what protocols JXTA has Security - whether JXTA is secure Applications - what JXTA can be used for Collaboration - how JXTA grows

Page 212: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

213

Introduction

“JXTA” - pronounced as “juxta” as in “juxtaposition” started by Sun's Chief Scientist Bill Joy an effort to create a common platform for building

distributed services and applications Napster, Gnutella, and Freenet provide users with

limited ability to share resources and are unable to share data with other, similar applications

Page 213: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

214

Goal/Purpose

enable a wide range of distributed computing applications by developing a common set of general purpose P2P protocols

achieve platform independence - any language, any OS, any hardware

overcome the limitations found in many today's P2P applications

enable new applications to run on any device that has a digital heartbeat (desktop computers, servers, PDAs, cell phones, and other connected devices)

Page 214: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

215

Technology JXTA technology is based on XML, Java technology, and

key concepts of UNIX operating system Transmitted information is packaged as messages.

Messages define an XML envelop to transfer any kind of data.

The use of Java language is not required - JXTA protocols can be implemented in C, C++, Perl, or any other programming language

Page 215: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

216

JXTA Core

JXTA Services

JXTA Applications

Structure

Any Peer on the Web (Desktop, cell phone, PDA, Server)

Security

Peer Groups Peer Pipes Peer Monitoring

JXTA Community Services SunJXTA Services

JXTA Community Applications Sun Applications

JXTAShell

PeerCommands

Page 216: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

217

Multi-Layered Structure JXTA Core

• Peer Groups - mechanisms to/for create and delete, join, advertise, discover, communication, security, content sharing

• Peer Pipes - transfer of data, content, and code in a protocol-independent manner

• Peer Monitoring - including access control, priority setting, traffic metering and bandwidth balancing

JXTA Services • expand upon the capabilities of the core and facilitate

application development• mechanisms for searching, sharing, indexing, and caching code

and content to enable cross-application bridging and translation of files

JXTA Shell - much like UNIX OS• facilitate access to core-level functions through a command line

JXTA Applications - built using peer services as well as the core layer

Page 217: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

218

Protocols JXTA is a set of six protocols Peer Discovery Protocol - find peers, groups,

advertisements Peer Resolver Protocol - send/receive search queries Peer Information Protocol - learn peers’

status/properties Peer Membership Protocol - sign in, sign out,

authentication Pipe Binding Protocol - pipe advertisement to pipe

endpoint Endpoint Routing Protocol - available routes to

destination

Page 218: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

219

Security Confidentiality, integrity, availability - authentication,

access control, encryption, secure communication, etc.

Developing more concrete and precise security

architecture is an ongoing project JXTA does not mandate certain security polices, encryption

algorithms or particular implementations JXTA 1.0 provides Security Primitives:

• crypto library (MD5, RC4, RSA, etc.)• Pluggable Authentication Module (PAM) • password-based login• transport security mechanism modeled after SSL/TLS

Page 219: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

220

Potential Applications Search the entire web and all its connected devices (not

just servers) for needed information Save files and information to distributed locations on the

network Connect game systems so that multiple people in

multiple locations Participate in auctions among selected groups of

individuals Collaborate on projects from anywhere using any

connected device Share compute services, such as processor cycles or

storage systems, regardless of where the systems or the users are located

Page 220: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

221

JXTA Search Overview Started in June 2000 by Infrasearch as an idea to

distribute queries to network peers best capable of answering them

Now it is the default searching methodology for the JXTA framework in the form of JXTA Search

Communication via an XML protocol called Query Routing Protocol (QRP)

Network components: Providers, Consumers, Hubs Capable of providing both wide and deep search;

deep search shows the most benefits Design goals: Simplicity, Structure, Extensibility,

Scalability

Page 221: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

222

JXTA Search Benefits Speed of update - especially noticeable in

deep search, where large data in databases are accessed directly without a need to create a central index.

Access - in crawling based approach many companies are resilient to grant access to web crawlers. In distributed approach the companies can serve the data as they feel appropriate.

Efficiency - no need to create a centrally placed and maintained index for the whole web.

Page 222: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

223

JXTA Search Architecture

Each JXTA peer can run instances of Provider, Consumer, and Registration services on top of its JXTA core.

Each peer interacts with the JXTA Search Hub Services, which is also running on top of the JXTA core.

Page 223: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

224

JXTA Search Architecture

Each peer within the network interacts with the hub using its appropriate service

Registration

Consumer

Provider

Hub

JXTA Peer

Page 224: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

225

Collaboration

Currently over 25 companies are participating in developing JXTA projects.

Core (7 projects)• security, juxta-c, juxtaperl, pocketjxta

Services (20 projects)• search, juxtaspaces, p2p-email, juxta-grid, payment,

monitoring Applications (12 projects)

• shell, jnushare, dfwbase, brando

Other projects (5) - demos, tutorials, etc.

Page 225: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

226

Future Work, Questions

Future Work• C/C++ implementation

• KVM based implementation (PDAs, cell phones)

• Naming and binding services

• Security services (authentication, access control)

• Solutions for firewalls and NAT gateways

Is this the right structure? Do JXTA protocols dictate too much or too little?

Page 226: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

227

10. Application Layer Anycasting: A Server Selection Architecture and Use in Replicated Web Service

Ellen W. ZeguraMostafa H. Amamr

Zongming FeiNetworking and Telecommunications Group

Georgia Tech, Atlanta, GA

Samrat BattacharjeeDepartment of Computer Science

University of Maryland, College Park, MD

Page 227: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

228

Agenda

Problem Statement The Anycasting Communication

Paradigm Some Related Work Application Layer Anycasting Experimental Results Conclusions

Page 228: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

229

Problem Statement

Efficient service provision in wide area networks

Replicated services Applications want access to the best

server Best may depend on time, performance,

policy

Page 229: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

230

Server Replication

Standard technique to improve scalability of a service

Issues in server replication Location of servers Consistency across servers Server selection

Server selection problem How does a client determine which of the

replicated servers to access ?

Page 230: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

Client

Content Equivlent Servers

Request

?

Page 231: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

232

Server Selection

Alternatives Designated (e.g. nearest) server Round robin assignment (e.g. DNS rotator) Explicit list with user selection Selection architecture (e.g. Cisco

DistributedDirector) Application-Layer Anycasting:

Client requests connection to anycast group Anycast group consists of replicated (equivalent) servers System connects client to any good server

Page 232: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

233

Anycasting Communication Paradigm

Anycast identifier specifies a group of equivalent hosts

Requests are sent to best host in the group

Page 233: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

Client

Content Equivlent Servers

Request

Reply

Anycast mechanism resolves to one of

possible many

Page 234: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

235

Existing Anycast Solutions and Limitations Existing Solutions:

RFC 1546 Host Anycasting Service• Definition of anycasting communication paradigm• Implementation suggestions for the network layer

IETF Server Location Protocol – Still existing ? AKAMAI and other commercial cache/CDNs Cisco DistributedDirector

Limitations Global router support Per diagram destination selection Limited set of metrics No option for user input in server selection Allocation of IP address space for anycast address

Page 235: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

236

Application-Layer Anycasting

Application

Client filters

Anycast group table

Resolver filters

Anycast-aware client Anycast Resolver

Anycast query

(truncated) list of servers

List of serversBest

server

Page 236: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

237

Filters

Content-independent filters E.g. Round-robin

Metric-based filters E.g. Minimum response time

Policy-based filters E.g. Minimum cost

Filter Specification – Meteric Qualified ADN: <Meteric_Service>%<Domain Name> ServerLoad.Daily_News%cc.gatech.edu

Page 237: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

238

Application-layer Anycasting Architecture

Client-Server Communication

Push Daemon

Content Server

Anycast Resolver

Resolver Probe

Anycast aware Client

Client

Server Resolver

Probes

Server Pushes

Anycast Query/ Response

Probe Updates

Performance Updates

Page 238: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

239

Anycast Groups

Anycast groups consist of collection of IP addresses, domain names or aliases

Group members provide equivalent service, e.g., mirrored FTP servers or web search engines

Anycast groups identified by Anycast Domain Names

Group membership an orthogonal issue architecture aliases

Page 239: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

240

Anycast Domain Names

Structure: <Service>%<Domain Name>

Example: Daily-News%cc.gatech.edu

Local anycast resolverAuthoritative anycast resolver for AND X

Anycast client

1. Request ADN X

2. Determine auth. resolver

3. Request members, metrics for ADN X

4. Members and metrics for ADN X5. Cache ADN X

6. Anycast response

Page 240: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

241

Implementation

Implementation using Metric Qualified ADNs

Intercept calls to gethostbyname Transparent access to anycasting

without modifying existing applications

gethostbyname

Anycast Resolver

DNS

Filters

ADN Domain Names

other

IP Address

Page 241: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

242

Response Time Determination for Web Servers Response time:

Measured from time client issues request until receives last byte of file of network

Round trip path delays + server processing delays Overview of technique:

Resolver probes for path-dependent response time Server measures and pushes path-independent

processing time Lighter-weight push more frequent than heavier-

weght probe Probe result used to calibrate pushed value

Page 242: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

243

Performance Metric Determination

Metric collection techniques Server push algorithm Agent probe mechanism Hybrid push/probe technique

Page 243: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

244

Server Push Process

Typical server response cycle:assign process to handle queryparse querylocate requested filerepeat until file is written

read from filewrite to network

Process: Measure and smooth time until first read

(TUFR) Push if significant change

Page 244: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

245

Resolver Process and Hybrid Technique Resolver probe process:

Request dummy file from server Measure response time (RT)

Hybrid push-probe technique Dummy file contains most recent TUFR Each probe: compute scaling factor SF =

RT/TUFR Each push: estimate RT = SF x TUFR

Page 245: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

246

Performance of Hybrid Algorithm

Page 246: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

247

Wide-Area Experiment

Page 247: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

248

Refinement

Problem of oscillation among servers Set of Equivalent Servers (ES)

Subset of the replicated servers whose measured performance is within a threshold of best performance

Page 248: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

249

Performance of Anycast vs Random Selection

Page 249: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

250

Performance of Server Location Schemes

Server Location Algorithm

Average Response Time (sec.)

Standard Deviation (sec.)

Anycasting 0.49 0.69

Nearest Server 1.12 2.47

Random 2.13 6.96

50% improvement using Nearest Server

Another 50% improvement using Anycasting

More predictable service

Page 250: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

251

Performance as More Clients Anycast

Page 251: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

252

Avoid Oscillations Among Servers Basic technique:

Estimate response time for each server Indicate the best server when queried

Identifying one best server can result in oscillations

Use set of equivalent servers Choose randomly among equivalent

servers

Page 252: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

255

Conclusions

Summary Server replication increasingly important – web

services etc. Application layer architecture that is scalable using

replicated resolvers organized in a DNS like hierarchy Web server performance can be tracked with

reasonable relative accuracy. Techniques used can be generalized to other servers

A hybrid push-probe technique provides scalable monitoring. May be useful in other contexts

Application-layer anycasting gives significant improvement over other server selection techniques

Page 253: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

256

Any problems (truth in advertising Discussions

Was the study extensive enough ?• 4 Anycast-aware servers – UCLA (1), WUStL(1), Gatech (2)• Anycast resolvers – UMD, Gatech• 20 Anycast-aware clients – UMD (4), Gatech (16)

Study of anycast vs random selection Experiments done one after another – any performance

difference due to cached content ? Would performance improve if network-support

for path performance metrics included ? Global IP-anycast [SIGCOMM00]

Page 254: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

257

11. A Case For End System Multicast

Yang-hua Chu, Sanjay Rao and Hui Zhang

Carnegie Mellon University

Page 255: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

258

Talk Outline

Definition, potential benefits & problems

Look at Narada approach specifically Performance in simulation Performance in network experiment Related work Discussion

Page 256: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

259

IP Multicast

• No duplicate packets• Highly efficient bandwidth usage

Key Architectural Decision: Add support for multicast in IP layer

Berkeley

Gatech Stanford

CMU

Routers with multicast support

Page 257: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

260

Key Concerns with IP Multicast Scalability with number of groups Routers maintain per-group state Analogous to per-flow state for QoS guarantees Aggregation of multicast addresses is complicated

Supporting higher level functionality is difficult IP Multicast: best-effort multi-point delivery service End systems responsible for handling higher level functionality

Reliability and congestion control for IP Multicast complicated

Inter-domain routing is hard. No management of flat address space. Deployment is difficult and slow

ISP’s reluctant to turn on IP Multicast

Page 258: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

261

End System MulticastStanford

CMU

Stan1

Stan2

Berk2

Overlay TreeGatech

Berk1

Berkeley

Gatech Stan1

Stan2

Berk1

Berk2

CMU

Page 259: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

262

Scalability (number of sessions in the network) Routers do not maintain per-group state End systems do, but they participate in very few

groups Easier to deploy Potentially simplifies support for higher level

functionality Leverage computation and storage of end systems For example, for buffering packets, transcoding, ACK

aggregation Leverage solutions for unicast congestion control

and reliability

Potential Benefits

Page 260: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

263

Performance Concerns

CMU

Gatech Stan1

Stan2

Berk1

Berk2

Duplicate Packets:

Bandwidth Wastage

CMU

Stan1

Stan2

Berk2

Gatech

Berk1

Delay from CMU to

Berk1 increases

Page 261: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

264

What is an efficient overlay tree? The delay between the source and receivers is small

Ideally, The number of redundant packets on any physical

link is low Heuristic we use:

Every member in the tree has a small degree Degree chosen to reflect bandwidth of connection

to Internet

Gatech

“Efficient” overlay

CMU

Berk2

Stan1

Stan2

Berk1Berk1

High degree (unicast)

Berk2

Gatech

Stan2CMU

Stan1

Stan2

High latency

CMU

Berk2

Gatech

Stan1

Berk1

Page 262: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

265

Why is self-organization hard? Dynamic changes in group membership Members join and leave dynamically Members may die

Limited knowledge of network conditions Members do not know delay to each other when

they join Members probe each other to learn network related

information Overlay must self-improve as more information

available Dynamic changes in network conditions

Delay between members may vary over time due to congestion

Page 263: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

266

Berk2 Berk1

CMU

Gatech

Stan1Stan2

Narada Design (1)

Step 1

“Mesh”: Subset of complete graph may have cycles and includes all group members• Members have low degrees• Shortest path delay between any pair of members along mesh is

small

Step 0Maintain a complete overlay graph of all group members• Links correspond to unicast paths• Link costs maintained by polling

Berk2Berk1

CMU

Gatech

Stan1Stan2

Page 264: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

267

Narada Design (2)

CMU

Berk2 GatechBerk1

Stan1Stan2

• Source rooted shortest delay spanning trees of mesh• Constructed using well known routing algorithms

– Members have low degrees– Small delay from source to receivers

Step 2

Page 265: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

268

Narada Components Mesh Management:

Ensures mesh remains connected in face of membership changes

Mesh Optimization: Distributed heuristics for ensuring shortest path

delay between members along the mesh is small Spanning tree construction:

Routing algorithms for constructing data-delivery trees

Distance vector routing, and reverse path forwarding

Page 266: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

269

Optimizing Mesh Quality

Members periodically probe other members at random

New Link added ifUtility Gain of adding link > Add Threshold

Members periodically monitor existing links Existing Link dropped if

Cost of dropping link < Drop Threshold

Berk1

Stan2CMU

Gatech1

Stan1

Gatech2

A poor overlay topology

Page 267: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

270

The terms defined Utility gain of adding a link based on

The number of members to which routing delay improves

How significant the improvement in delay to each member is

Cost of dropping a link based on The number of members to which routing delay

increases, for either neighbor Add/Drop Thresholds are functions of:

Member’s estimation of group size Current and maximum degree of member in the

mesh

Page 268: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

271

Desirable properties of heuristics Stability: A dropped link will not be

immediately readded Partition Avoidance: A partition of the mesh is

unlikely to be caused as a result of any single link being dropped

Delay improves to Stan1, CMU

but marginally.

Do not add link!

Delay improves to CMU, Gatech1

and significantly.

Add link!

Berk1

Stan2CMU

Gatech1

Stan1

Gatech2

Probe

Berk1

Stan2CMU

Gatech1

Stan1

Gatech2Probe

Page 269: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

272

Used by Berk1 to reach only Gatech2 and vice versa.

Drop!!

An improved mesh !!

Gatech1Berk1

Stan2CMU

Stan1

Gatech2

Gatech1Berk1

Stan2CMU

Stan1

Gatech2

Page 270: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

273

Narada Evaluation

Simulation experiments Evaluation of an implementation on the

Internet

Page 271: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

274

Performance Metrics Delay between members using Narada Stress, defined as the number of identical

copies of a packet that traverse a physical link

Berk2

Gatech Stan1Stress = 2

CMU

Stan2

Berk1

Berk2CMU

Stan1

Stan2Gatech

Berk1

Delay from CMU to

Berk1 increases

Page 272: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

275

Factors affecting performance Topology Model Waxman Variant Mapnet: Connectivity modeled after several ISP backbones ASMap: Based on inter-domain Internet connectivity

Topology Size Between 64 and 1024 routers

Group Size Between 16 and 256

Fanout range Number of neighbors each member tries to maintain in the

mesh

Page 273: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

276

Delay in typical run4 x unicast delay 1x unicast delay

Waxman : 1024 routers, 3145 linksGroup Size : 128 Fanout Range : <3-6> for all members

Page 274: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

277

Naive Unicast

Native Multicast

Narada : 14-fold reduction in

worst-case stress !

Stress in typical run

Page 275: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

278

Overhead

Two sources pairwise exchange of routing and control

information polling for mesh maintenance.

Claim: Ratio of non-data to data traffic grows linearly with group size.

Narada is targeted at small groups.

Page 276: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

279

Related Work

Yoid (Paul Francis, ACIRI) More emphasis on architectural aspects, less on

performance Uses a shared tree among participating members

• More susceptible to a central point of failure• Distributed heuristics for managing and

optimizing a tree are more complicated as cycles must be avoided

Scattercast (Chawathe et al, UC Berkeley) Emphasis on infrastructural support and proxy-based

multicast• To us, an end system includes the notion of

proxies Also uses a mesh, but differences in protocol details

Page 277: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

280

Conclusions

Proposed in 1989, IP Multicast is not yet widely deployed Per-group state, control state complexity and scaling

concerns Difficult to support higher layer functionality Difficult to deploy, and get ISP’s to turn on IP

Multicast Is IP the right layer for supporting multicast

functionality? For small-sized groups, an end-system overlay

approach is feasible has a low performance penalty compared to IP

Multicast has the potential to simplify support for higher layer

functionality allows for application-specific customizations

Page 278: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

281

Open Questions

Theoretical bounds on how close an ESM tree can come to IP multicast performance.

Alternate approach: Work with complete graph but modify multicast routing protocol.

Leveraging unicast reliability and congestion contol.

Performance improvements: Reduce polling overhead.

Page 279: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

282

Internet Evaluation 13 hosts, all join the group at about the same

time No further change in group membership Each member tries to maintain 2-4 neighbors

in the mesh Host at CMU designated source

Berkeley

UCSB

UIUC1

UIUC2 CMU1

CMU2

UKY

UMass

GATech

UDelVirginia1

Virginia2

UWisc

8

31

1

10

13

15

14

111

381

10

Page 280: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

283

Narada Delay Vs. Unicast Delay

Internet Routing

can be sub-optimal

(ms)

(ms)

2x unicast delay 1x unicast delay

Page 281: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

284

12. Overcast: Reliable Multicasting with an Overlay Network

Paper authors: Jannotti, Gifford, Johnson, Kaashoek, O’Toole Jr.

Page 282: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

285

Design Goals

Provide application-level multicasting using already existing technology via overcasting

Scalable, efficient, and reliable distribution of high quality video

Compete well against IP Multicasting

Page 283: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

286

Overcasting vs IP Multicast

Overcasting imposes about twice as much network load as IP multicast (for large networks). This difference is in O(n). (Figure 4)

Page 284: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

287

Architecture of an Overcast system The entities which form the architecture

of the Overcast system are called nodes.

Nodes are connected via an organizational scheme distribution tree.

Each group provides content, which is replicated at each of the nodes. A node can conceivably participate in more than one group.

Clients are end-system consumers of content.

Page 285: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

288

Root nodes (1)

Content originates at the root node and is streamed down the distribution tree.

The root is the administrational center of the group.

When clients join the group, they go to a webpage at the root, which redirects them to the “best” overcast node

Is the root a single point of failure?

Page 286: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

289

Root nodes (2)

• Linear roots can alleviate this problem. For the source to fail, all roots must fail.

• Using round robin IP resolution, you can stop your content serving cluster from being ‘slashdotted’ (overloaded by sheer popularity) by client requests.

Page 287: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

290

Distribution tree

The distribution tree is built and maintained using a self-organizing algorithm.

The primary heuristic of this algorithm is to maximize bandwidth from the root to an overcast node.

Backbone nodes are nodes which are located on or close to a network backbone. Overcast performs better when these backbone nodes are located at the top of the tree (ie, they are switched on first)

Page 288: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

291

Tree building protocol (1)

A node initializes by booting up, obtaining its IP, and contacts a “global, well-known registry” (Possible point of failure?) with a unique serial number.

Registry provides a list of groups to join.

This node initially chooses the root as its parent. A series of rounds will begin in which the node decides where on the tree it should be.

Page 289: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

292

Tree building protocol (2)

For each round, evaluate the bandwidth we have to our parent. Also consider the bandwidth to the rest of our parent’s children.

If there is a tie (bandwidth differences between 2 or more nodes within 10%), break it by the number of hops reported by traceroute.

The child selects the best of it’s parents children as its new parent.

Nodes maintain an ancestor list and can rejoin further up the tree when it’s ancestors fail.

Page 290: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

293

Reaching a stable state

Overcast nodes can still receive content from the root, even when the tree is not stabilized. A typical round period is about 1 to 2 seconds.

Page 291: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

294

Tree building protocol (3)

By having consecutive rounds of tree building, the distribution tree can overcome conditions occuring on the underlying network.

The up/down protocol is used to maintain information about the status of nodes.

Each node in the network maintains a table of information about all of it’s descendants.

After the tree stabilizes, nodes will continue to consider relocating after a reevaluation period.

Page 292: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

295

Up/down protocol

A parent that gets a new child gives its parent a birth certificate, which propagates to the root. The node’s sequence number is incremented by 1. Sequence numbers are used to prevent a race condition.

When a parent’s child fails to report its status (called checkin), after a length of time called a lease period, the parent propagates a death certificate for its child up the tree.

A child also presents any certificates or changes it has accumulated from its last checkin.

Page 293: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

296

Problems

A simple approach leads to a simple solution: Not appropriate for software (same as

mirroring a file!) Not appropriate for games (latency is too

high!) What about teleconferencing? Authors suggest

that if a non-root node wishes to send to other nodes, the node should first unicast (send via normal TCP/IP to the root, which will then overcast it as normal). Is this a good solution?

Page 294: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

297

Closing thoughts

Simulated on a virtual network topology (Georgia Tech Internetwork Topology Models). Take results with a grain of salt (or two).

Might work out well commercially. Consider a high demand for high definition video over the net, and corporations/entities willing to deliver it (CNN,Hollywood studios,Olympics). Overcast could be a premium service for ISP subscribers [like newsgroups,www hosting].

Page 295: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

298

13. Enabling Conferencing Applications on the Internet using an Overlay Multicast ArchitectureYanghua Chu et al. (CMU)SIGCOMM’01

most slides comes from authors presentation

Page 296: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

299

internet

Overlay Multicast Architecture

Page 297: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

300

Past Work Self-organizing protocols

Yoid (ACIRI), Narada (CMU), Scattercast (Berkeley), Overcast (CISCO), Bayeux (Berkeley), …

Construct overlay trees in distributed fashion Self-improve with more network information

Performance results showed promise, but… Evaluation conducted in simulation Did not consider impact of network dynamics on

overlay performance

Page 298: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

301

Focus of This Paper

Can End System Multicast support real-world applications on the Internet? Study in context of conferencing applications Show performance acceptable even in a

dynamic and heterogeneous Internet environment

First detailed Internet evaluation to show the feasibility of End System Multicast

Page 299: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

302

Why Conferencing?

Important and well-studied Early goal and use of multicast (vic, vat)

Stringent performance requirements High bandwidth, low latency

Representative of interactive applications E.g., distance learning, on-line games

Page 300: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

303

Supporting Conferencing in ESM

Framework Bandwidth estimation Adapt data rate to bandwidth est. by packet dropping

Objective High bandwidth and low latency to all receivers along the

overlay

D

C

A

B2 Mbps

2 Mbps 0.5 MbpsSource rate

2 MbpsEstimated bandwidth

Transcoding (possible)

(DSL)

Page 301: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

304

Enhancements of Overlay Design Two new issues addressed

Dynamically adapt to changes in network conditions

Optimize overlays for multiple metrics• Latency and bandwidth

Study in the context of the Narada protocol (Sigmetrics 2000) Use Narada to define logical topology Techniques presented apply to all self-organizing

protocols

Page 302: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

305

• Capture the long term performance of a link – Exponential smoothing, Metric discretization

Adapt to Dynamic Metrics Adapt overlay trees to changes in network

condition Monitor bandwidth and latency of overlay links

Link measurements can be noisy Aggressive adaptation may cause overlay

instability Conservative adaptation may endure bad

performance

time

band

wid

th

raw estimatesmoothed estimatediscretized estimate

transient: do not react

persistent:react

Page 303: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

306

Optimize Overlays for Dual Metrics

Prioritize bandwidth over latency Break tie with shorter latency

SourceReceiver X

30ms, 1Mbps

60ms, 2MbpsSource rate

2 Mbps

Page 304: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

307

Example of Protocol BehaviorM

ean

Rec

eive

r B

andw

idth

Reach a stable overlay• Acquire network info• Self-organization

Adapt to network congestion

All members join at time 0 Single sender, CBR traffic

Page 305: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

308

Evaluation Goals

Can ESM provide application level performance comparable to IP Multicast?

What network metrics must be considered while constructing overlays?

What is the network cost and overhead?

Page 306: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

309

Evaluation Overview

Compare performance of our scheme with Benchmark (sequential unicast, mimicing IP

Multicast) Other overlay schemes that consider fewer

network metrics Evaluate schemes in different scenarios

Vary host set, source rate Performance metrics

Application perspective: latency, bandwidth Network perspective: resource usage,

overhead

Page 307: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

310

Benchmark Scheme

IP Multicast not deployed Sequential Unicast: an approximation

Bandwidth and latency of unicast path from source to each receiver

Performance similar to IP Multicast with ubiquitous deployment

C

A B

Source

Page 308: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

311

Overlay Schemes

Overlay Scheme Choice of Metrics

Bandwidth Latency

Bandwidth-Latency

Bandwidth-Only

Latency-Only

Random

Page 309: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

312

Experiment Methodology

Compare different schemes on the Internet Ideally: run different schemes concurrently Interleave experiments of schemes Repeat same experiments at different time of

day Average results over 10 experiments

For each experiment All members join at the same time Single source, CBR traffic Each experiment lasts for 20 minutes

Page 310: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

313

Application Level Metrics

Bandwidth (throughput) observed by each receiver

RTT between source and each receiver along overlay

D

C

A

B

Source Data path

RTT measurement

These measurements include queueing and processing delays at end systems

Page 311: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

314

Performance of Overlay Scheme

“Quality” of overlay tree produced by a scheme

Sort (“rank”) receivers based on performance Take mean and std. dev. on performance of

same rank across multiple experiments Std. dev. shows variability of tree quality

Rank1 2

RTTCMU

MIT

Harvard

CMU

MIT

Harvard

Exp2Exp1

Different runs of the same scheme mayproduce different but “similar quality” trees

32ms

42ms

30ms

40ms

Exp1Exp2

Mean Std. Dev.

Page 312: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

315

Factors Affecting Performance

Heterogeneity of host set Primary Set: 13 university hosts in U.S. and

Canada Extended Set: 20 hosts, which includes hosts

in Primary Set , Europe, `Asia, and behind ADSL

Source rate Fewer Internet paths can sustain higher source

rate

Page 313: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

316

Three Scenarios Considered

Does ESM work in different scenarios? How do different schemes perform

under various scenarios?

Primary Set1.2 Mbps

Primary Set2.4 Mbps

Extended Set2.4 Mbps

(lower) “stress” to overlay schemes (higher)

Primary Set1.2 Mbps

Page 314: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

317

BW, Primary Set, 1.2 Mbps

Naïve scheme performs poorly even in a less “stressful” scenario

RTT results show similar trend

Internet pathology

Page 315: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

318

Scenarios Considered

Does an overlay approach continue to work under a more “stressful” scenario?

Is it sufficient to consider just a single metric? Bandwidth-Only, Latency-Only

Primary Set1.2 Mbps

Primary Set2.4 Mbps

Extended Set2.4 Mbps

(lower) “stress” to overlay schemes (higher)

Page 316: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

319

BW, Extended Set, 2.4 Mbps

no strong correlation betweenlatency and bandwidth

Optimizing only for latency has poor bandwidth performance

Page 317: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

320

RTT, Extended Set, 2.4Mbps

Bandwidth-Only cannot avoidpoor latency links or long path length

Optimizing only for bandwidth has poor latency performance

Page 318: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

321

Summary so far…

For best application performance: adapt dynamically to both latency and bandwidth metrics

Bandwidth-Latency performs comparably to IP Multicast (Sequential-Unicast)

What is the network cost and overhead?

Page 319: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

322

Resource Usage (RU)

Captures consumption of network resource of overlay tree

Overlay link RU = propagation delay Tree RU = sum of link RU

UCSD

CMU

U.Pitt

UCSD

CMU

U. Pitt

40ms

2ms

40ms

40ms

Efficient (RU = 42ms)

Inefficient (RU = 80ms)

IP Multicast 1.0

Bandwidth-Latency 1.49

Random 2.24

Naïve Unicast 2.62

Scenario: Primary Set, 1.2 Mbps(normalized to IP Multicast RU)

Page 320: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

323

Protocol Overhead

Results: Primary Set, 1.2 Mbps Average overhead = 10.8% 92.2% of overhead is due to bandwidnth probe

Current scheme employs active probing for available bandwidth Simple heuristics to eliminate unnecessary

probes Focus of our current research

Protocol overhead = total non-data traffic (in bytes)

total data traffic (in bytes)

Page 321: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

324

Contribution First detailed Internet evaluation to show the

feasibility of End System Multicast architecture Study in context of a/v conferencing Performance comparable to IP Multicast

Impact of metrics on overlay performance For best performance: use both latency and

bandwidth

More info: http://www.cs.cmu.edu/~narada

Page 322: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

325

Discussion (1)

Peer-to-peer versus proxy based architecture

Peer-to-peer Proxy based

Distributed Share network information , history across groups

Scalable to number of groups

Long term connection, more stable

Manage resource allocation among groups

Page 323: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

326

Discussion (2)

Multipath framework where each recipient gets data from the source along multiple paths, with a fraction of the data flowing along any given path Any individual path doesn’t radically affect

overall performance Receive data while monitoring

Page 324: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

327

Discussion (3)

Rigorous change detection algorithm On the Constancy of Internet Path Properties

• www.aciri.org/vern/imw2001-papers/38.ps.gz

Idea of more formally identifying changes • www.variation.com/cpa/tech/changepoint.html

Page 325: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

328

14. Fault-tolerant replication management inlarge-scale distributed storage systems

Richard GoldingStorage Systems Program, Hewlett

Packard Labs

[email protected]

Elizabeth BorowskyComputer Science Dept., Boston

College

[email protected]

Page 326: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

329

Introduction

Palladio - solution for detecting, handling, and recovering from both small- and large-scale failures in a distributed storage system.

Palladio - provides virtualized data storage services to applications via set of virtual stores, which are structured as a logical array of bytes into which applications can write and read data. The store’s layout maps each byte in its address space to an address on one or more devices.

Palladio - storage devices take an active role in the recovery of the stores they are part of. Managers keep track of the virtual stores in the system, coordinating changes to their layout and handling recovery from failure.

Page 327: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

330

• Provide robust read and write access to data in virtual stores.

• Atomic and serialized read and write access.

• Detect and recover from failure.

• Accommodate layout changes.

EntitiesHostsStoresManagersManagement policies

ProtocolsLayout Retrieval protocolData Access protocolReconciliation protocolLayout Control protocol

Page 328: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

331

Access protocol allows hosts to read and write dataon a storage device as long as there are no failures or layoutchanges for the virtual store. It must provide serialized, atomic writes that can span multiple devices.Layout retrieval protocol allows hosts to obtain the current layout of a virtual store — the mapping from the virtual store’s address space onto the devices that store parts of it.Reconciliation protocol runs between pairs of devices to bring them back to consistency after a failure.Layout control protocol runs between managers and devices — maintains consensus about the layout and failure status of the devices, and in doing so coordinates the other three protocols.

Protocols

Page 329: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

332

Layout Control Protocol

The layout control protocol tries to maintain agreementbetween a store’s manager and the storage devices that hold the store.

• The layout of data onto storage devices

• The identity of the store’s active manager.

The notion of epochs• The layout and manager

are fixed during each epoch• Epochs are numbered• Epoch transitions• Device leases acquisition

and renewal• Device leases used to detect

possible failure.

Page 330: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

333

Operation during an epoch

• The manager has quorum and coverage of devices.

• Periodic lease renewal» In case a device fails to report

and try to renew its lease, the manager considers it failed

» In case the manager fails to renew the lease, the device considers the manager failed and starts a manager recovery sequence

• When the manager looses quorum or coverage the epoch ends and a state of epoch transition is entered.

Page 331: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

334

Epoch transition

• Transaction initiation• Reconciliation• Transaction commitment

• Garbage collection

Page 332: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

335

The recovery sequence

• Initiation - querying a recovery manager with the current layout and epoch number

Page 333: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

336

The recovery sequence (continued)

• Contention - managers struggle to obtain quorum and coverage and to become active managers for the store - (recovery leases, acks and rejections)

Page 334: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

337

The recovery sequence (continued)

• Completion - setting correct recovery leases & starting epoch transition

• Failure - failure of devices and managers during recovery

Page 335: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

338

Extensions

• Single manager v.s. Multiple managers• Whole devices v.s. Device parts (chunks)• Reintegrating devices• Synchrony model (future)• Failure suspectors (future)

Page 336: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

339

Conclusions & recap

Palladio - Replication management system featuring

» Modular protocol design» Active device participation» Distributed management function» Coverage and quorum condition

Page 337: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

340

Application example

Manager node

ID=hash<FileID, MGR>

Storage nodes

ID=hash<FileID, STR, n>

Popularity indicator

Very popular content

Page 338: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

341

Application example - benefits

Stable manager node

Stable storage nodes

• Self-manageable storage

• Increased availability

• Popularity is hard to fake

• Less per node load

• Could be appliedrecursively (?)

Page 339: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

342

Wrapup discussion questions (1): What is a peer-peer network (what is not a peer-to-peer

network?). Necessary: every node is designed to (but may not by user

choice) provide some service that helps other nodes in the network get service

no 1-N service providing each node potentially has the same responsibility,

functionality (maybe nodes can be polymorhpic)• corollary: by design, nothing (functionally) prevents

two nodes from communicating directly some applications (e.g., Napster) are a mix of peer-

peer and centralized (lookup is centralized, file service is peer-peer) [recursive def. of peer-peer]

(logical connectivity rather than physical connectivity) routing will depend on service and data

Page 340: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

343

Overlays?

What is the relationship between peer-peer and application overlay networks? Peer-peer and application overlays are different

things. It is possible for an application level overlay to be built using peer-peer (or vice versa) but not always necessary

Overlay: in a wired net: if two nodes can communicate in the overlay using a path that is not the path the network level routing would define for them. Logical network on top of underlying network

• source routing? Wireless ad hoc nets – what commonality is there

REALLY?

Page 341: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

344

Wrapup discussion questions (2): What were the best p2p idea Vote now (and should it be a secret

ballot usign Eternity

Page 342: 1 Peer-peer and Application-level Networking Presented by Jon Crowcroft Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class

345

Wrapup discussion questions (3):

Is ad hoc networking a peer-peer application? Yes (30-1)

Why peer-peer over client-server? A well-deigned p2p provides better “scaability”

Why client-server of peer-peer peer-peer is harder to make reliable availability different from client-server (p2p is more

often at least partially “up”) more trust is required

If all music were free in the future (and organized), would we have peer-peer. Is there another app: ad hoc networking, any copyrighted

data, peer-peer sensor data gathering and retrieval, simulation

Evolution #101 – what can we learn about systems?