distributed communities on the web de montréalkropf/articles/tutorial-p2p.pdf · distributed...

106
November 11 - 15 ISADS 2002, Guadalajara, Mexico Université de Montréal Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research University of Montreal [email protected]

Upload: others

Post on 23-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico

Universitéde Montréal

Distributed Communitieson the Web

Peter KropfDepartment of Computer Science

and Operations ResearchUniversity of Montreal

[email protected]

Page 2: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 3

! Common interests! Shared context! Self-organisation! Autonomous individuals/entities

Communities are groups of objects in a sharedcontext. This allows at least a communication witheach other. [DCW 2000: Plaice, Kropf, Unger]

Social aggregates emerging from the Internet whenenough people carry on public discussions longenough and with sufficient human feeling to formwebs of personal relationships [ Howard Rheingold]

CommunitiesCommunities

Page 3: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 4

Science FictionScience Fiction

! Dawn Addams (The Hitchhiker’s guide to the galaxy ):“A computer terminal is not just a clumsy old TVset with a typewriter keyboard in front of it, but aninterface to connect mind and body to the universeand move parts of it through space”

! 1982 beschreibt William Gibson in seinem SF-Roman den Cyberspace: “... als grafischeWiedergabe abstrahierter Daten aus den Banken(DB) sämtlicher Computer im menschlichenSystem, ... die Matrix”.

Page 4: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 5

! How do communities emerge and how can their structuresbe analyzed and modeled?

! Which effects and laws are known from social communitiesand how are these communities structured and reorganized?

! How do cooperation and work division emerge? What does itmean to speak about “communities” within the context ofthe Web (e.g. for data search, network load, structuring ...)?

! How can dynamic changes be considered within acommunity?

! Which advantages and new services can be developed andused?

Peer-to-Peer systems are deployed and workingdistributed communities on the Net

Communities: key issuesCommunities: key issues

Page 5: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 6

PeerPeer--toto--PeerPeer (P2P)(P2P) -- What is it?What is it?

! P2P is nothing new – see Arpanet (Internet)! Every participating node acts as a client

and a server at the same time («servent»)! Completely decentralist :

– No central control/coordination– No central data base– Global behavior emerges from local

interactions– No peer has a global view of the system– All data and services are accessible from

any peer– Peers are autonomous– Peers and interconnections are unreliable

! «Business model»: Every node contributesto the system by providing access to someof its resources: incentive to participate.

! Online communities

Page 6: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 7

Types of P2P systemsTypes of P2P systems

! E-commerce systems– eBay, B2B market places, …

! File sharing systems– Napster, Freenet, Gnutella, Morpheus, KaZaA, AWeb…

! Distributed databases– Mariposa, DNS, …

! Networks, «middleware»– Arpanet, WOS (Web Operating System), JXTA,

Mobile ad-hoc networks (MANET), Multi-agent systems ….

P2P is an application-level internet on top of the Internet

Overlay network

Cooperation is required

Page 7: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 8

P2P Cooperation ModelsP2P Cooperation Models

! Centralized model– global index held by a central authority

(single point of failure)– direct contact between requestors and providers– Example: Napster

! Decentralized model– Examples: Freenet, Gnutella– no global index, no central coordination, global behavior emerges

from local interactions, etc.– direct contact between requestors and providers (Gnutella) or

mediated by a chain of intermediaries (Freenet)

! Hierarchical model– introduction of “super-peers”– mix of centralized and decentralized model– Example: DNS

Page 8: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 9

Data Management in P2P SystemsData Management in P2P Systems

! Problem– Peers in a P2P system need to share information– Central database would contradict the P2P paradigm– Can a distributed database be supported by peers

without central control

! Example– Directory of all files in a file-sharing system

! Basic Operations in a database– Searching information (efficiently)– Updating information (consistently)

Page 9: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 10

QuestionsQuestions

! Can a set of peers without central coordinationprovide– efficient search on a distributed database– while the storage space at each peer is compared to the

whole database small

! Efficient search– searchtime(query) ≈≈≈≈ Log(size(database))

! Small storage space– storagespace(agent) ≈≈≈≈ Log(size(database))

Page 10: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 11

P2P Data Access StructuresP2P Data Access Structures

! Every peer maintains a small fragment of thedatabase and a routing table

! The peers implement some routing strategy! Replication can be used to increase robustness

and performance

Page 11: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 12

Related ApproachesRelated Approaches

Related distributed information system approaches:

– Event-based systems (publish/subscribe)

– Push systems (broadcast model)

– Mobile agents (code/data moves in network; cooperates with

other agents and can “learn” where to go next)

– Distributed databases (often relies on central coordination)

passive activeEvent-based systemsPush systems

Mobile agentsPeer-to-peer systems

Page 12: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 13

PeerPeer--toto--Peer vs. C/S and WebPeer vs. C/S and Web--based Systemsbased Systems

Client-ServerSession-based Web-based

Peer-to-Peer

Coupling tight loose very loose

Comm.Style asymmetric asymmetric symmetric

Number ofClients

moderate(1000)

high(1,000,000) high (1,000,000)

Number ofServers few (10) many

(100,000) none (0)

(Aberer 02)

Page 13: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 14

Case Study:Case Study: NapsterNapster

Page 14: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 15

NapsterNapster: A brief history: A brief history

! May 1999: Napster Inc. file share service founded by Shawn Fanningand Sean Parker (1st year students at Northeastern University)

! Dec 7 1999: Recording Industry Association of America (RIAA) suesNapster for copyright infringement (100,000$ per song copied)

! April 13, 2000: Heavy metal rock group Metallica sues Napster forcopyright infringement

! April 27, 2000: Rapper Dr. Dre sues Napster! May 3, 2000: Metallica’s attorney claims 317,377 Internet users illegally

share Metallica’s songs via Napster! May 5, 2000 A judge rules that the “Safe Harbour Provision” exception

of DCMA (Digital Millenium Copyright Act) does not apply to Napster(art 512)

! July 26, 2000: Court orders Napster to shut down on July 29 (appliedJuly 28)

! Oct 31, 2000: Bertelsmann becomes a partner and drops lawsuit! Feb 12, 2001: Court orders Napster to cease trading copyrighted songs

and to prevent subscribers to gain access to content on its searchindex that could potentially infringe copyrights

! Feb 20, 2001: Napster offers $1 billion to record companies (rejected)! March 2, 2001: Napster installs software to satisfy the order

Page 15: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 16

DCMADCMA

Digital Millennium Copyright Act: Safe HarbourProvision:

512 (k) 1 a): « the term `service provider' means anentity offering the transmission, routing, or providingof connections for digital online communications,between or among points specified by a user, ofmaterial of the user's choosing, without modificationto the content of the material as sent or received. »

Page 16: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 17

! Browser! Search engine (on virtual central database)! Transfer of files! Chatrooms! Audioplayer to play MP3 files from inside the

browser! Hotlist features for favorite songs! Instant messaging, mailing! Easy access! No need to give real name when registering! fast

NapsterNapster servicesservices

Page 17: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 18

Fast adoption ofFast adoption of NapsterNapster

2001 (before shutdown) : more than 60 million users

Page 18: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 19

NapsterNapster: Architecture: Architecture

! Central (virtual) database which holds a catalog ofoffered files (MP3/WMA) (C/S)

! Clients connect to the server, identify themselvesand send a list of files they offer and they arewilling to share (C/S)

! Other clients can search the catalog and learnfrom which clients they can retrieve the desiredfiles (P2P)

! Direct file transfer between clients (P2P)! First time users must register (account)! Combination of client/server and P2P approaches

Page 19: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 20

NapsterNapster: communication model: communication model

NapsterServerregister

(user, files) “where is xx.mp3?”

“A has xx.mp3”

download xx.mp3

BA

Page 20: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 21

NapsterNapster: The Protocol: The Protocol [Drscholl01][Drscholl01]

! The protocol was never published openly and is rathercomplex and inconsistent

! OpenNap have reverse-engineered the protocol andpublished their findings

! TCP is used for C/S communication! Messages to/from the server have the following format:

– length specifies the length of the data portion– type defines the message type– data: the transferred data

• plain ASCII, in many cases enclosed in double quotes (e.g.,filenames such as “bamba.mp3” or client id’s such as “nap v0.8”)

length type data

Byte offset 0 1 2 3 4 ..... n

Page 21: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 22

Sample MessagesSample Messages -- 11

Type C/S Description Format0 S Error message <message>2 C Login <nick><pwd><port><client info><link type>3 S Login ack <user’s email>5 S Auto-upgrade <new version><http-hostname:filename>6 C New user login <nick><pwd><port><client info><speed>

<email address>100 C Client notification

of shared file“<filename>”<md5><size><bitrate><frequency><time>

200 C Search request [FILENAME CONTAINS “artist name”]MAX_RESULTS <max> [FILENAME CONTAINS<song] [LINESPEED <comp> <link type>][BITRATE <comp> “bit rate”] [FREQ <comp>“freq”] [WMA-FILE] [LOCAL_ONLY]

201 S Search response “<filename>”<md5><size><bit rate><frequency><length><nick><ip address>

202 S End of searchresponse

(empty)

Page 22: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 23

Sample MessagesSample Messages -- 22

Type C/S Description Format203 C Download request <nick> “<filename>”204 S Download ack <nick><ip><port> “<filename>” <md5>

<linespeed>206 S Peer to download not

available<nick> “<filename>”

209 S Hotlist user signed on <user><speed>211 C Browse a user’s files <nick>212 S Browse response <nick> “<filename>”<md5><size>

<bit rate><frequency><time>213 S End of browse list <nick>[<ip address>]500 C Push file to me

(firewall problem)<nick> “<filename>”

501 S Push ack (to otherclient)

<nick><ip address><port> “<filename>”<md5><speed>

Page 23: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 24

NapsterNapster: Summary: Summary

! Centralized system with direct user interaction– Single access point : sensitive to failure– Limited scalability (central database)

! The protocol is complex, incoherent andproprietary

! The search for files is fast thanks to powerful(multiple) servers (fast database)

! The topology is known => copyright, …! The participants’ reputation is not considered (are

the mp3 files of good quality? Do they containmusic?)

Page 24: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 25

Case study :Case study : GnutellaGnutella

Page 25: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 26

GnutellaGnutella: a brief history: a brief history

! Developed in a 14 days “quick hack” by Nullsoft(Winamp)

! Objective: exchange of cooking recipes! Chronology:

– Published under «GNU General Public License» on theNullsoft Webserver

– Taken off after a couple of hours later by Nullsoft’s ownerAOL-Time Warner

– This was enough to “infect” the Internet– The Gnutella protocol was reverse- engineered from

versions downloaded from the Nullsoft site– New Gnutella servents were quickly developed and

started rapidly to spread (even more so after theshutdown of Napster)

Page 26: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 27

TheThe GnutellaGnutella networknetwork

! No central server! Gnutella program = server and client (“servent”)! Flooding: requests are sent to neighbors (often 4

or 5) which pass them on to their neighbors, andso on; live-time limited by a TTL (typically 7).

! For each request, servents search for local filescorresponding to the request

! File transfer out of band with HTTP! Joining the system: at least one Gnutella host

must be known (e.g. gnutellahosts.com:6346)! Meeting : TCP message : GNUTELLA CONNECT/x.x\n\n

GNUTELLA OK\n\n

Page 27: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 28

Protocol : Relative anonymityProtocol : Relative anonymity

! Requests do not contain identity! Each servent only reliably knows about the

servents it is directly connected to! The direct HTTP file download reveals the identity

?

?? ?

?

?

???

?

Page 28: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 29

InformationDescriptionType

servent ID, index ofrequested file, host:portaddress

File download request forservents behind firewalls

Push

host:port address, hostbandwidth, number andtotal size of local files

reply to a PingPong

nonea request for host addressesPing

file description,host:port address, hostbandwidth, servent ID

answer from hosts havingmatching files

Reply(QueryHit)

speed, keywordsuser search requestQuery

Message typesMessage types

Page 29: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 30

Packet headerPacket header

! MessageID: 16 byte unique id! FunctionID: packet type! RemainingTTL: number of times the packet

will be forwarded! HopsTaken: TTL(0) = TTL(i) + Hops(i)! DataLength: the length of the remaining data

of this packet

Byte offset 0 16 17 18 19 22

Message ID Function ID TTL Hops DataLength

Page 30: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 31

Ping/Pong mechanismPing/Pong mechanism

! A host receiving a ping should :– answer with a Pong containing its own address– forward it to its neighbors

! In practice :– some hosts do not respond : reduce overhead– some hosts do not propagate : reply with a small number

of addresses– well-known Gnutella sites : serve as directories, not as

active host (e.g. gnutellahosts.com)– Recently: introduction of “ultra-peers”

Page 31: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 32

Axx

ABCDE

Ping: S

Pong: A

Pong: B

Pong: C

Pong: D

Pong: E

F: no response

New peers:

S

C

E

D

A

B

F

Peer meeting with Ping/PongPeer meeting with Ping/Pong

Page 32: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 33

Ping/Pong messagesPing/Pong messages

! Ping (0x00) : packet header with payload 0x00! Pong (0x01): packet header with payload :

Port IP address Nb shrd files KB shrd files

Byte 0 1 6 10 13

• Port where the responding host accepts connections• IP address of the responding host• Number of shared files• Number of Kbytes shared

Page 33: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 34

Query xx.mp3

Query hit : B

Query hit : C

Query hit : D

S

C

E

D

A

B

F

RequestsRequests:: Query/Query/QueryHitQueryHit/GET/GET

HTTPGET xx.mp3

xx.mp3

xx.mp3

xx.mp3

xx.mp3

Page 34: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 35

QueryQuery

! Query (0x80):

– Minimum speed: minimal bandwidth (in kbps) ofresponding servents

– Search criteria: character sequence ending with zero(0x00); maximal length is limited by the “Payload length”in the header

Search criteriaMinimum speed

Byte offset 0 1 2 ....

Page 35: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 36

PortNumberof hits

Byte offset 0 1 2 3 6 7 10 11 .... n n+16

IP address Speed Result

setServent

identifier

File index File size File nameByte offset 0 3 4 7 8 ......

QueryHitQueryHit

QueryHit (0x81)

– Number of hits: answer– Port: where the responding node accepts a connection– IP address: address of the responding node– Speed: bandwidth of responding node– Servent identifier: 16-byte word identifying the responding node

(unique)– Result set : results (number of hits entries)

• File index: a unique number assigned to a file by responding node• File size: file size (in bytes)• File name: file name ending with two zeros (0x0000)

Page 36: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 37

File DownloadFile Download

! «Out of band» with simplified HTTP! Connection to the IP address given by QueryHit! Example: 2369 5326983 xx.mp3\0x00\0x00

File index File size File name

GET /get/2369/xx.mp3/ HTTP/1.0\r\nConnection: Keep-Alive\r\nRange: bytes=0\r\nUser-Agent: Gnutella\r\n\r\n

HTTP 200 OK\r\nServer: Gnutella\r\nContent-type: application/binary\r\nContent-length: 5326983\r\n\r\n<data> ...

Page 37: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 38

Byte offset 0 15 16 19 20 23 24 25

Serventidentifier

File index

IP address Port

Handling Firewalls: The Push DescriptorHandling Firewalls: The Push Descriptor

! If a host cannot be contacted directly (firewall)! The servent receiving a Push descriptor (0x40)

initiates the file transfer (with outgoing connection)

– Servent identifier: 16-byte string uniquely identifying theservent which is requested to push the file

– File index: unique identifier of the file to be pushed– IP address: of the host to which the file should be pushed– Port: to which the file should be pushed

Page 38: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 39

! Servent A receives a QueryHit of servent B which is behind afirewall and accepts connections only on the Gnutella port

! A sends Push (0x40) to B

! B opens a connection to the IP/port given in the Push message andsends:

! When A receives GIV, it starts a regular download on theconnection:

! Does not work if both parties are behind firewalls

GnutellaGnutella Push :Push : serventservent behind firewallbehind firewall

GIV <File index>:<Servent identifier>/<File name>\n\n

GET /get/<File index>/<File name>/ HTTP/1.0\r\nConnection: Keep-Alive\r\nRange: bytes=0\r\nUser-Agent: Gnutella\r\n\r\n

Servent identifier File index IP address Port

Page 39: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 40

FreeFree--ridingriding :: downloadsdownloads

[Adar, Huberman]

Many servents share files nobody downloads :Of 11,585 sharing hosts:

– Top 1% of sites provide nearly 47% of all answers– Top 25% of sites provide 98% of all answers– 7,349 (63%) never provide a query response

Page 40: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 41

FreeFree--riding : number of shared filesriding : number of shared files

Most Gnutella users are free riders[Adar00]

Page 41: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 42

FreeFree--ridingriding

Of 33,335 hosts:

–22,084 (66%) of the peers share no files

–24,347 (73%) share ten or less files

99%3,082,5728,333 hosts (25%)

98%3,037,2326,667 hosts (20%)

94%2,928,9055,000 hosts (15%)

87%2,692,0823,334 hosts (10%)

70%2,182,0871,667 hosts (5%)

37%1,142,645333 hosts (1%)

As % of the wholeShareThe top

Page 42: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 43

StatisticsStatistics

! The top 1 percent of those queries accounted for37% of the total queries on the Gnutella network.The top 25 percent account for over 75% of thetotal queries.

! The top responding host only hosted 695 files, butresponded to 3,436 queries. The next mostresponsive peer hosted 956 files and responded to1,474 queries.

! There is only low correlation between quantity andquality of files shared

Page 43: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 44

Popularity of QueriesPopularity of Queries

! Very popular documents are approximately equally popular! Less popular documents follow a Zipf-like distribution (i.e., the

probability of seeing a query for the ith most popular query isproportional to 1/(ialpha)

! Access frequency of Web documents also follows Zipf-likedistributions ⇒⇒⇒⇒ caching might also work for Gnutella

[Sripanidkulchai01][Sripanidkulchai01]

Page 44: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 45

Online communitiesOnline communities ––PeerPeer--toto--Peer systems (P2P)Peer systems (P2P)

The Gnutella network as an experimentation platformfor evaluating whether such P2P systems areappropriate for online communities.

Ref.: J. Vaucher, G. Babin, P. Kropf, Th. Jouve: Experimenting withGnutella Communities, Distributed Communities on the Web,Sydney, April 2002, LNCS 2468, Springer Berlin, pp. 85-99

Page 45: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 46

! Working large scale P2P data-sharing system! Highly dynamic, peers are autonomous, self-

organized! Completely decentralized: no global index, no

central coordination. Global behavior emergesfrom local interactions

! Simple communication mechanism : based on“flooding”, direct contact between requesting andproviding nodes

! Simple, robust and sufficiently scalable! Much OpenSource software

WhyWhy GnutellaGnutella ??

Page 46: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 47

! Small world property: find everythingclose by [Milgram 67] [Watts 99]

! High clustering and short pathlengths observed

! Network topology unknown! Much overhead traffic for

maintaining connectivity! Broadcast strategy not appropriate

for small world networks

Which are the “good” peers andhow can they be determined ?

[Jovanovic01]

GnutellaGnutella NetworksNetworks

Page 47: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 48

Experimentation GoalsExperimentation Goals

! Quality of network :– good connections– stability– routing properties

! No consideration of file sharing and downloads:Free Riding on Gnutella well known [Adar00]:

– 70% of hosts share no files– 50% of responses by 1% of sharing nodes

Page 48: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 49

! Based on JTella :– author : Ken McCrary– written in Java– Java API to access the Gnutella network– 40 classes, 7000 lines of code (LOC)– easy to use: a simple application to monitor traffic

requires 150 LOC

Experimental platformExperimental platform

Page 49: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 50

Experimental platform : architectureExperimental platform : architecture

Implementation based on Jtella (K. McCrary)

Page 50: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 51

! HostCache :– contains only valid Pong addresses– limited to 200 addresses

! Statistics on connections :– connection type, start time, set-up time, duration,

number of messages received, termination code

! The performance observed heavily depends onchance

! Two exploration experiments show– the difficulty to maintain connectivity– the difficulty to quantify the behavior

Experimentation: measuring performanceExperimentation: measuring performance

Page 51: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 52

! Exploration experiment 1:– passive node : routes messages, collects statistics and

print status every 15 sec.

Measurements (1)Measurements (1)

Page 52: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 53

! Exploration experiment 2 :– two servents run in parallel– keep 4 connections open– monitored parameters :

• number of messages received• the horizon is a measure for the network size:

– a ping is broadcast every minute and we count the answeringPongs over the next minute

Measurements (2)Measurements (2)

Page 53: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 54

! 2 sessions in parallel for 45 min. :

Average message rate :- A : 120 msg/sec- B : 180 msg/sec

MeasurementsMeasurements

Input rate/time Horizon/time

Stochastic behavior: it isdifficult to estimate thenetwork size.

Page 54: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 55

! Validity of addresses received with Pongmessages:– many duplicate addresses :

• around 75 % of addresses received are identical toaddresses already in the cache (size : 200)

• around 25 % of addresses are identical to the onesreceived just before

– many “special” (invalid) addresses :• 0.0.0.0, 127.0.0.1, 255.255.255.255• multicast addresses, private addresses (NAT)

Validity of Pong information (1)Validity of Pong information (1)

Page 55: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 56

Validity of Pong information (2)Validity of Pong information (2)

! Analysis of received addresses:– 16 % to 28 % of addresses retained– limited size of cache: not all duplicates can be detected– unique valid addresses received: 9 to 20 % (obtained from

post-mortem analysis)

! Modification of cache algorithm : filtering invalid andrepeated addresses– experiment yielded around 30% valid addresses received– this is still roughly 4 times more addresses than used to

maintain connectivity

Page 56: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 57

Connection setup (1)Connection setup (1)

! Socket creation : probability of success toconnect to another host :– connection attempt fails :

• host too busy• application terminated, computer disconnected

– logical timeout for socket creation : 10 sec.(in Java 1.3 & Linux, actual timeout = 13 min.)

Session : 90 min., 2541 attempts of connection :

- 31% : success, connection achieved in 2.3 sec. ;

- 20% : failure reported rapidly in 1.7 sec. ;

- 49% : blocking, failure noted after 10 sec.

Page 57: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 58

Connection setup (2)Connection setup (2)

Test of 100 random addresses taken from previous sessions

- 36% : success, connection achieved in 1.6 sec. ;

- 20% : failure reported rapidly in 0.9 sec. ;

- 38% : blocking, failure noted after 10 sec.

! Most connections fail! Connections to hosts with previously successful

connections randomly chosen are not morereliable

Page 58: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 59

Distribution of message typesDistribution of message types

! Distribution of message types :– proportion of Pings and Replies remain constant– proportion of Pongs and Queries varies widely– substantial overhead is introduced by the continuous

Pings and Pongs

Page 59: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 60

Host quality (1)Host quality (1)

! “Good” hosts :– most participants are transients– definition of “good” hosts :

• “good” host = connection is maintained for at least 2 min.

– Dec 30th, 2001 : from 41789 connections over 24 hours,564 (1.3%) were considered “good”

– can “ good” hosts be reused for future sessions?

Page 60: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 61

! “Good” hosts :– try to reconnect to “good” hosts (without handshake):

Possibility to identify reliable semi-permanenthost

Host quality (2)Host quality (2)

Success rate after addresses were obtained :

- 24 h : 18 %, 48h : 10 %, 7 days : 7 %- 67 % of cases : unable to reconnect a single time- 10 % of cases : hosts were available 50 % of time- 0.7 % of cases : hosts were always responding

90 % of tested hosts can open Gnutella sessions (with handshake)

Page 61: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 62

LatencyLatency

! Latency :– measure : broadcast Ping every minute and collect all

responding Pongs– number of responding hosts : 5 to 252– hop counts : minimum is often 2 or 3

Delay measures :

- min : 0.9 sec. ; max : 128 sec. ; mean : 13 sec.

- mean hops : 5

- mean delay between host : 1.3 sec.

Page 62: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 63

! Goal : improving performance– need to evaluate effects of various servent parameters or

strategies :– need of a methodology that does

• not disrupt normal operation of the network• reduce effects of randomness on performance indicators• enable comparison of different versions of servents

Measuring performanceMeasuring performance

Page 63: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 64

! Run an experiment for each parameter or strategyto test

! Experimentation: lasts 24 hours, 24 runs of 45minutes

! Each run = 2 servents in parallel : a “benchmark”servent and a “test” servent

! Parameters of “benchmark” runs are constant;this serves as basis for comparison

! Measured indicators :– nbr of messages, nbr of Pings, nbr of Pongs, average

horizon, nbr of distinct host addresses found

Measuring methodology (1)Measuring methodology (1)

Page 64: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 65

! Statistics collected on indicators used to compute:

where :- m : number of indicators used ;- xij

s : value of indicator i at the jth run of servent s, where s = b for thebenchmark servent and s=t for the test servent

Measuring methodology (2)Measuring methodology (2)

performance ratio r

Page 65: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 66

! Evaluation of methodology– Influence of the number of connections on the

performance of a servent :

Performance measurementsPerformance measurements

Page 66: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 67

! The performance observed heavily depends on chance:– Difficulty to quantify behavior– Difficulty to maintain connectivity

! Network traffic overhead produced by flooding is large (50%from ping/pong)

! Majority of addresses received were invalid or redundant :filter is necessary

! 1/3 of outgoing attempts lead to valid connections, butaverage connection duration only 30 sec.

! Possibility to identify (rare) semi-reliable hosts! Average delay of ping answer : 15 sec.! Difficult to estimate size of the network :

– 2/3 of “good” hosts do not respond to Pings

Conclusions ofConclusions of GnutellaGnutella Experiments (1)Experiments (1)

Page 67: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 68

Conclusions ofConclusions of GnutellaGnutella Experiments (2)Experiments (2)

! The use of this type of (overlay) network is still limited fordeveloping online community applications.

! Local intelligence on community is necessary for efficiencyand scalability

! The highly volatile character of the network calls for efficientmethods for self-organization

! Future work :– Development of more sophisticated strategies and policies to

maintain connectivity and to (self-) organize the network– Evaluation of new strategies with practical tests– Simulations to evaluate the effect of different strategies and

parameters

Page 68: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 69

GnutellaGnutella: Bandwidth Barriers: Bandwidth Barriers(experiments by Clip2)(experiments by Clip2)

! Measuring Gnutella over 1 month:– typical query is 560 bits long (including TCP/IP headers)– 25% of the traffic are queries, 50% pings, 25% other– on average each peer seems to have 3 other peers actively

connected

! Scalability barrier with substantial performancedegradation if queries/sec > 10:

10 queries/sec* 560 bits/query* 4 (to account for the other 3 quarters of message traffic)* 3 simultaneous connections67,200 bps⇒ 10 queries/sec maximum in the presence of many dialup users⇒ more bandwidth will induce larger files: no improvement)

Page 69: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 70

GnutellaGnutella: Summary: Summary

! Completely decentralized! High fault tolerance but unstable! Adopts well and dynamically to changing peer populations! Protocol causes high network traffic. For example:

– 4 connections C / peer, TTL = 7– 1 ping packet can cause packets– 60% of today’s traffic on commercial Internet stems from Gnutella like

applications

! No estimates on the duration of queries can be given! No probability for successful queries can be given! Topology is unknown ⇒⇒⇒⇒ search algorithms cannot exploit it! Reputation of peers (trust) is not addressed! Simple, robust, and scalable (at the moment ??)

240,26)1(**20

=−∑ =

TTL

i

iCC

Page 70: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 71

SystemSystem ArchitecturesArchitectures

timeHierarchy

Asymmetry

Self-organizationSymmetry

CCCC

M

Music Industry

AA

CCCC

M

MP3

AA

C

C

C

gnutella

A

CCC

CM

Napster

A

A

unidirectional

interactive

Unidirectional/bilateral

multilateral

structure

con

tro

l

Centralization Decentralization

Page 71: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 72

Other Systems and ApproachesOther Systems and Approaches

! Freenet! JXTA! Mobile Ad-hoc Networks (MANET)! Grid Computing! Ubiquitous Computing! Agents! AWeb tool! Web Operating System (WOS)

Page 72: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 73

FreenetFreenet: System Architecture: System Architecture

! Adaptive P2P system which supports publication,replication, and retrieval of data

! Protects anonymity of authors and readers– infeasible to determine the origin or destination of data– difficult for a node to determine what it stores (files are sent

and stored encrypted)⇒nobody can be sued

! Requests are routed to the most likely physicallocation– no central server as in Napster– no constrained broadcast as in Gnutella

! Files are referred to in a location independent way! Dynamic replication of data

Page 73: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 74

FreenetFreenet: Searching: Searching [Hong01][Hong01]

! Graph structure actively evolves over time– new links form between nodes– files migrate through the network⇒ adaptive routing

[Abere02]

Page 74: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 75

FreenetFreenet: Summary: Summary

! Completely decentralized! High fault tolerance! Robust and scalable! Automatic replication of content! Adopts well and dynamically to changing peer populations! Spam content less of a problem (subspaces)! Adaptive routing preserves network bandwidth! No estimates on the duration of queries can be given! No probability for successful queries can be given! Topology is unknown ⇒⇒⇒⇒ algorithms cannot exploit it! Routing “circumvents” free-riders! Reputation of peers is not addressed! Supports anonymity of publishers and readers

Page 75: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 76

Project JXTA (SUN)Project JXTA (SUN)

! A network programming platform for P2P systems– 3-layer architecture– 6 XML-based protocols: discovery, membership, routing, ...– abstractions: peer groups, pipes, advertisements, ...

! Goal: a uniform platform for applications using P2P technology and forvarious P2P systems to interact

JXTA community applicationsPeershell

Peercommands

Sun JXTAapplications

JXTA community servicesSunJXTAservices

•Indexing•Searching•File sharing

Security

Peer groups Peer pipes Peer monitoring

Any peer on the extended Web

JXTAapplications

JXTAservices

JXTAcore

Page 76: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 77

Mobile AdMobile Ad--hoc Networks (hoc Networks (MANETsMANETs))[Hubaux01][Hubaux01]

Dynamic, location -sensitive data withrestricted access

? Mobile users with dynamicindividual interest profiles

Example:Distributed Virtual Database fore-business

Localinformationexchange

Applications: mobile information commercelocation-dependent information services

Page 77: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 78

Grid ComputingGrid Computing

! Grid computing: A set of connected Super Computers whichallows the transparent execution of a program by searchingand allocating the resources it requires.

! Middleware : transforms a collection of resources into asingle, coherent and virtual resource. Often based on MPIand PVM.

! Examples:– Globus (www.globus.org) : USC, ANL– NetSolve (www.cs.utk.edu/netsolve) : Univ. of Tennessee, ORNL.– Legion : Univ. of Verginia– Milan : New York Univ., Arisona State Univ.– UNICORE : Genias, Pallas, German Universities– SUN GridEngine– Seti@home (loosely coupled client-server cluster system)

Page 78: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 79

Example:Example: setiseti@home@home

SETI-Central

Peer

Peer

Peer

Peer

Task

• Loosely coupled client-server cluster system:« embarrassingly parallel application »

• Client computer downloads set of radio-data; analysis(Fourier) during idle times; results sent back to server

• Goal: look for signals of artificial origin

• 3.6 million clients (March 2003); 1.42E+21 floating pointoperations performed; 921,190 CPU years so far

Page 79: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 80

Ubiquitous ComputingUbiquitous Computing

! The disappearing computer: from fixed to mobile to wearable! It is about the Computer in the World and NOT the World in

the Computer : bridging the gap between virtual and realworld

! Context- and location-aware, diverse and numerous, human-centric

! Much technology driven: Moore’s law! Smart devices with spontaneous network capabilities that

have access to any information or provide access to anyservice “on the net”

! Vision: everyday objects become smart and interconnected;they communicate and cooperate : communities

Page 80: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 81

Ubiquitous ComputingUbiquitous Computing

Page 81: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 82

AgentsAgents -- CommuntiesCommunties

! Agent characteristics– Autonomy: capability to pursue its goals without interactions

or commands from the environment– Social ability: capability to interact with the environment;

context/situation awareness– Reactivity: capability of reacting appropriately to influences or

information from its environment– Proactivity: capability to take the initiative under specific

circumstances– Mobility: the ability to move around in an electronic network

(Security? Trust? Ontology?)– Collaboration: the ability to collaborate and coordinate actions

and behavior– Learning: the ability to learn from past situations and actions

Agent systems may be perceived as communities

Examples: Agentcities (www.agentcities.org), ACAN (Univ. Ottawa), and many more ….

Page 82: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 83

Medium

Agent

Communication

Medium

Information Object

MediumMedium –– A Sphere for AgentsA Sphere for AgentsCommunities = Agents + MediaCommunities = Agents + Media

[Schmid01]

“The medium is the message” [Wadge02]

Page 83: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 84

AWebAWeb tool (1)tool (1)

Herwig Unger, Markus Wulff, University of Rostock

Page 84: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 85

AWeb toolAWeb tool (2)(2)

Page 85: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 86

AWebAWeb Tool (3)Tool (3)

Page 86: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 87

AWebAWeb Tool (4)Tool (4)

Page 87: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 88

The Web Operating System (WOS)The Web Operating System (WOS)

The Web Operating System (WOSTM) concept was developedseveral years ago in response to the emergence of globallynetworked systems embracing computers, appliances andcommunication devices. It follows the vision of openenvironments characterized by having components that areautonomous, numerous, heterogeneous, context-aware andadaptable.

These components do not stand alone, but communicate,negotiate and collaborate to serve an individual’s or group'sgoals. This gives raise to evolving associations or communitiesof participants, be they physical devices, software objects orpeople mediated by a shared context.Project partners: Peter Kropf (University of Montreal), Gilbert Babin (HEC Montreal), HerwigUnger (University of Rostock), John Plaice (University of New South Wales)

Page 88: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 89

RationaleRationale for WOSfor WOS

WOSMiddleware/Services

Self-configuring networks of mobile and stationarydevices form communities

Page 89: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 90

RationaleRationale for WOSfor WOS

! WOS aims to :– Supply users with adequate tools that allow for the

implementation of specific services– Provide users with great flexibility in the “semantic” of

their services.

! Services are volatile:– Services appear, disappear, evolve, etc.– Services and environments have to adapt to their

contexts.

! WOS Vision :– Any-time, any-where, any-service.

Page 90: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 91

WOSRPWOSP

TheThe Nature of WOSNature of WOSInfrastructure of WOSInfrastructure of WOS

WOSnode

WOSnode

WOSnode

Netw

orkN

etwork

Softw

areR

esourcesW

arehouse

Har

dwar

eR

esou

rces

War

ehou

se

Information Warehouse

Request Warehouse

Search Evaluation

API UI

Local O/S

Page 91: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 92

TheThe Nature of WOSNature of WOSCommunicationCommunication FrameworkFramework

! The WOS Protocol (WOSP)– Simple generic syntax– Extensible– A version of WOSP is a specialization of the generic

syntax

! The WOS Request Protocol (WOSRP)– Search/Localization of WOS nodes– Exchange of information about WOSP version– Establishment of WOSP connections

Page 92: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 93

TheThe Nature of WOSNature of WOSService classes of WOSService classes of WOS

! A WOS node provides a set of service classes– A service class is a set of services of same nature– Example : Configuring and executing High Performance (HP)

applications– A user query is served by a specific service belonging to a

specific service class

! All service classes respect the generic WOS Protocol(WOSP)– Every service class has its own semantic– We define a specific version of WOSP for each service class– A service class is a WOSP version

! Services are dynamic and must be able to evolve

Page 93: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 94

TheThe Nature of WOSNature of WOSInformationInformation FrameworkFramework

! Resource Warehouses– Contain the description of the local resources provided by the

WOS node– Preserve information about remote resources already requested

! What is resource ?– Hardware : CPU, Memory, etc.– Software : Java, PVM, MPI, WOSP version, etc.– Others : Effective CPU or Network performance, etc.

! Warehouses are distributed and have limited storagecapacity– They learn and forget information

Page 94: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 95

TheThe Nature of WOSNature of WOSInformationInformation FrameworkFramework

! A resource is describedby attributes: an attributeand its associated valueis called an av-pair

! The structure of awarehouse is ahierarchical arrangementof av-pairs: An av-pair is adescendent of another av-pair when it depends on it

Page 95: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 96

UserUser RequestsRequests

! Users make requests to identify services that fulfilltheir needs

! Requests are built using arrangements ofattributes and values, linked by relationaloperators: av-relations– For example

• [Attribute:Price ≤≤≤≤ Value: 10 dollars]

! A request consists of two predicates– Pu : user-specific characteristics of the service– Pc : context-specific characteristics of the service

Page 96: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 97

ProcessingProcessing a Usera User RequestRequest::TheThe SearchSearch AlgorithmAlgorithm

! Search (Pu, Pc, q) : tries to return at least q servicesmatching Pu and Pc

1. Performs a local request : Request (Pu, Pc, q)– If the number of services found is sufficient, the algorithm stops

2. Performs a remote request– Finds nodes with the same context from local warehouses– Performs the request on each of these remote nodes– If the number of services found is sufficient, the algorithm stops

3. Performs a remote request (2nd try)– Finds any node from known remote nodes– Performs the request on each of these remote nodes– If the number of services found is sufficient, the algorithm stops

4. Find at least one other WOS node– Uses a Bootstrap algorithm– Restarts at step 1

Page 97: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 98

PerformingPerforming a Locala Local RequestRequest::TheThe RequestRequest AlgorithmAlgorithm

! Request (Pu, Pc, q) : tries to return at least qservices matching Pu and Pc from the localwarehouses

– The best case• We find at least q services matching Pu and Pc

– The intermediate case• We gradually reduce the constraints imposed by Pc until we

find at least q services

– The worst case• We find all the services only matching Pu

• The set of services found may be empty

Page 98: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 99

Notion of Best FitNotion of Best Fit

! Best Fit means the services which match the user requestand most or all of the contextual parameters

! Example– Pu = [Attribute: service = value: printing]– Pc = [Attribute: building = value: HEC]

– The search algorithm will try to choose all printers located inthe HEC building

• [value: HEC]

– Otherwise, the search algorithm would provide the user withprinting services in other buildings

Page 99: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 100

Managing WarehouseManaging Warehouse ContentContent

! WOS nodes have limited storage capacities– Warehouses cannot grow indefinitely– Mechanisms must be put in place to control the size of

warehouses

! Warehouses are updated whenever a WOS node receivesanswers from remote nodes– Decisions must be taken whether to insert, update, or remove

information.

! For each av-pair in a warehouse, we keep track of– Its creation date– Its last modification date– Its number of access

! This information enables a WOS node to properly managethe limited storage capacity allocated to it

Page 100: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 101

Applications of WOSApplications of WOS

! Virtual communities– Shared contexts– Adaptive, dynamic

management of federations

! Grid computing– Transparent remote execution– Transparent distributed file

system access

! E-Commerce– Discovery of business

partners– Transparent transmission of

business data

any-time, any-where, any-service, any-mediumThe WOS for

Page 101: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 102

PeerPeer--toto--peerpeer networknetworkOptimizing the QoSOptimizing the QoS in a WOSin a WOS communitycommunity

! Simplified virtual space (community) :– n nodes, c clients, s servers, c >> s– The same service offered by all servers

! Overlay network defined by neighborhoodrelationships :– Locally stored knowledge of a node and the

communication path to it (c-s; s-s)

! Optimization goal :– Reduce response time t to requests– Balance server load

! Metric for response time : t(c,s) = k*t + d/b(c,s)! The optimization process is initiated by any

unsatisfied client or an overloaded server

Page 102: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 103

Optimizing the QoSOptimizing the QoS in a WOSin a WOS communitycommunity

Page 103: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 104

Applications of WOS:Applications of WOS:Grid computingGrid computing :: HHPP--WOSPWOSP

! HP-WOSP is a class of services used to configureHigh Performance applications

! Services of HP-WOSP are :– Discovery Service : HP-WOSP (discovery, pgm)– Reservation Service : HP-WOSP (reservation, pgm)– Setup Service : HP-WOSP (setup, pgm)

HP-WOSP

HP-WOSP

User Request

WOSRP RequestWOSRP Reply

X-WOSP RequestX-WOSP Reply

HP-WOSP

Page 104: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 105

Usersending/receivingnews and email viaPDA or cell phone

WOScompliantnewsprovider

WOScompliantemailprovider

WOSNetWOS printer

Document printor e-paper

Delivery ofinformation/service

What next?What next? -- From connected society toFrom connected society tonetworked societynetworked society

Page 105: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 106

Open environments - DistributedOpen environments - Distributedcommunitiescommunities

! The “any-” vision calls for concepts to structurethe diversity

! Open environment – components:– Autonomous– Heterogeneous– Numerous (large scale)– Mobile and adaptive in space and time– Context aware– Collaborative– Dynamic membership (join/change/leave)

! Decentralized – Self-organized

Distributed Communities

Page 106: Distributed Communities on the Web de Montréalkropf/articles/tutorial-p2p.pdf · Distributed Communities on the Web Peter Kropf Department of Computer Science and Operations Research

November 11 - 15ISADS 2002, Guadalajara, Mexico 107

Distributed CommunitiesDistributed Communities

! Distributed communities : evolving associations ofparticipants (people, devices, software), mediatedby a shared context.

! Community : medium of collaborationtransforming the participants involved

! Not a space for random interaction betweenindividuals, but a structure for efficient interactionto acquire and disseminate information