distributed communities on the web de montréalkropf/articles/tutorial-p2p.pdf · distributed...
TRANSCRIPT
November 11 - 15ISADS 2002, Guadalajara, Mexico
Universitéde Montréal
Distributed Communitieson the Web
Peter KropfDepartment of Computer Science
and Operations ResearchUniversity of Montreal
November 11 - 15ISADS 2002, Guadalajara, Mexico 3
! Common interests! Shared context! Self-organisation! Autonomous individuals/entities
Communities are groups of objects in a sharedcontext. This allows at least a communication witheach other. [DCW 2000: Plaice, Kropf, Unger]
Social aggregates emerging from the Internet whenenough people carry on public discussions longenough and with sufficient human feeling to formwebs of personal relationships [ Howard Rheingold]
CommunitiesCommunities
November 11 - 15ISADS 2002, Guadalajara, Mexico 4
Science FictionScience Fiction
! Dawn Addams (The Hitchhiker’s guide to the galaxy ):“A computer terminal is not just a clumsy old TVset with a typewriter keyboard in front of it, but aninterface to connect mind and body to the universeand move parts of it through space”
! 1982 beschreibt William Gibson in seinem SF-Roman den Cyberspace: “... als grafischeWiedergabe abstrahierter Daten aus den Banken(DB) sämtlicher Computer im menschlichenSystem, ... die Matrix”.
November 11 - 15ISADS 2002, Guadalajara, Mexico 5
! How do communities emerge and how can their structuresbe analyzed and modeled?
! Which effects and laws are known from social communitiesand how are these communities structured and reorganized?
! How do cooperation and work division emerge? What does itmean to speak about “communities” within the context ofthe Web (e.g. for data search, network load, structuring ...)?
! How can dynamic changes be considered within acommunity?
! Which advantages and new services can be developed andused?
Peer-to-Peer systems are deployed and workingdistributed communities on the Net
Communities: key issuesCommunities: key issues
November 11 - 15ISADS 2002, Guadalajara, Mexico 6
PeerPeer--toto--PeerPeer (P2P)(P2P) -- What is it?What is it?
! P2P is nothing new – see Arpanet (Internet)! Every participating node acts as a client
and a server at the same time («servent»)! Completely decentralist :
– No central control/coordination– No central data base– Global behavior emerges from local
interactions– No peer has a global view of the system– All data and services are accessible from
any peer– Peers are autonomous– Peers and interconnections are unreliable
! «Business model»: Every node contributesto the system by providing access to someof its resources: incentive to participate.
! Online communities
November 11 - 15ISADS 2002, Guadalajara, Mexico 7
Types of P2P systemsTypes of P2P systems
! E-commerce systems– eBay, B2B market places, …
! File sharing systems– Napster, Freenet, Gnutella, Morpheus, KaZaA, AWeb…
! Distributed databases– Mariposa, DNS, …
! Networks, «middleware»– Arpanet, WOS (Web Operating System), JXTA,
Mobile ad-hoc networks (MANET), Multi-agent systems ….
P2P is an application-level internet on top of the Internet
Overlay network
Cooperation is required
November 11 - 15ISADS 2002, Guadalajara, Mexico 8
P2P Cooperation ModelsP2P Cooperation Models
! Centralized model– global index held by a central authority
(single point of failure)– direct contact between requestors and providers– Example: Napster
! Decentralized model– Examples: Freenet, Gnutella– no global index, no central coordination, global behavior emerges
from local interactions, etc.– direct contact between requestors and providers (Gnutella) or
mediated by a chain of intermediaries (Freenet)
! Hierarchical model– introduction of “super-peers”– mix of centralized and decentralized model– Example: DNS
November 11 - 15ISADS 2002, Guadalajara, Mexico 9
Data Management in P2P SystemsData Management in P2P Systems
! Problem– Peers in a P2P system need to share information– Central database would contradict the P2P paradigm– Can a distributed database be supported by peers
without central control
! Example– Directory of all files in a file-sharing system
! Basic Operations in a database– Searching information (efficiently)– Updating information (consistently)
November 11 - 15ISADS 2002, Guadalajara, Mexico 10
QuestionsQuestions
! Can a set of peers without central coordinationprovide– efficient search on a distributed database– while the storage space at each peer is compared to the
whole database small
! Efficient search– searchtime(query) ≈≈≈≈ Log(size(database))
! Small storage space– storagespace(agent) ≈≈≈≈ Log(size(database))
November 11 - 15ISADS 2002, Guadalajara, Mexico 11
P2P Data Access StructuresP2P Data Access Structures
! Every peer maintains a small fragment of thedatabase and a routing table
! The peers implement some routing strategy! Replication can be used to increase robustness
and performance
November 11 - 15ISADS 2002, Guadalajara, Mexico 12
Related ApproachesRelated Approaches
Related distributed information system approaches:
– Event-based systems (publish/subscribe)
– Push systems (broadcast model)
– Mobile agents (code/data moves in network; cooperates with
other agents and can “learn” where to go next)
– Distributed databases (often relies on central coordination)
passive activeEvent-based systemsPush systems
Mobile agentsPeer-to-peer systems
November 11 - 15ISADS 2002, Guadalajara, Mexico 13
PeerPeer--toto--Peer vs. C/S and WebPeer vs. C/S and Web--based Systemsbased Systems
Client-ServerSession-based Web-based
Peer-to-Peer
Coupling tight loose very loose
Comm.Style asymmetric asymmetric symmetric
Number ofClients
moderate(1000)
high(1,000,000) high (1,000,000)
Number ofServers few (10) many
(100,000) none (0)
(Aberer 02)
November 11 - 15ISADS 2002, Guadalajara, Mexico 14
Case Study:Case Study: NapsterNapster
November 11 - 15ISADS 2002, Guadalajara, Mexico 15
NapsterNapster: A brief history: A brief history
! May 1999: Napster Inc. file share service founded by Shawn Fanningand Sean Parker (1st year students at Northeastern University)
! Dec 7 1999: Recording Industry Association of America (RIAA) suesNapster for copyright infringement (100,000$ per song copied)
! April 13, 2000: Heavy metal rock group Metallica sues Napster forcopyright infringement
! April 27, 2000: Rapper Dr. Dre sues Napster! May 3, 2000: Metallica’s attorney claims 317,377 Internet users illegally
share Metallica’s songs via Napster! May 5, 2000 A judge rules that the “Safe Harbour Provision” exception
of DCMA (Digital Millenium Copyright Act) does not apply to Napster(art 512)
! July 26, 2000: Court orders Napster to shut down on July 29 (appliedJuly 28)
! Oct 31, 2000: Bertelsmann becomes a partner and drops lawsuit! Feb 12, 2001: Court orders Napster to cease trading copyrighted songs
and to prevent subscribers to gain access to content on its searchindex that could potentially infringe copyrights
! Feb 20, 2001: Napster offers $1 billion to record companies (rejected)! March 2, 2001: Napster installs software to satisfy the order
November 11 - 15ISADS 2002, Guadalajara, Mexico 16
DCMADCMA
Digital Millennium Copyright Act: Safe HarbourProvision:
512 (k) 1 a): « the term `service provider' means anentity offering the transmission, routing, or providingof connections for digital online communications,between or among points specified by a user, ofmaterial of the user's choosing, without modificationto the content of the material as sent or received. »
November 11 - 15ISADS 2002, Guadalajara, Mexico 17
! Browser! Search engine (on virtual central database)! Transfer of files! Chatrooms! Audioplayer to play MP3 files from inside the
browser! Hotlist features for favorite songs! Instant messaging, mailing! Easy access! No need to give real name when registering! fast
NapsterNapster servicesservices
November 11 - 15ISADS 2002, Guadalajara, Mexico 18
Fast adoption ofFast adoption of NapsterNapster
2001 (before shutdown) : more than 60 million users
November 11 - 15ISADS 2002, Guadalajara, Mexico 19
NapsterNapster: Architecture: Architecture
! Central (virtual) database which holds a catalog ofoffered files (MP3/WMA) (C/S)
! Clients connect to the server, identify themselvesand send a list of files they offer and they arewilling to share (C/S)
! Other clients can search the catalog and learnfrom which clients they can retrieve the desiredfiles (P2P)
! Direct file transfer between clients (P2P)! First time users must register (account)! Combination of client/server and P2P approaches
November 11 - 15ISADS 2002, Guadalajara, Mexico 20
NapsterNapster: communication model: communication model
NapsterServerregister
(user, files) “where is xx.mp3?”
“A has xx.mp3”
download xx.mp3
BA
November 11 - 15ISADS 2002, Guadalajara, Mexico 21
NapsterNapster: The Protocol: The Protocol [Drscholl01][Drscholl01]
! The protocol was never published openly and is rathercomplex and inconsistent
! OpenNap have reverse-engineered the protocol andpublished their findings
! TCP is used for C/S communication! Messages to/from the server have the following format:
– length specifies the length of the data portion– type defines the message type– data: the transferred data
• plain ASCII, in many cases enclosed in double quotes (e.g.,filenames such as “bamba.mp3” or client id’s such as “nap v0.8”)
length type data
Byte offset 0 1 2 3 4 ..... n
November 11 - 15ISADS 2002, Guadalajara, Mexico 22
Sample MessagesSample Messages -- 11
Type C/S Description Format0 S Error message <message>2 C Login <nick><pwd><port><client info><link type>3 S Login ack <user’s email>5 S Auto-upgrade <new version><http-hostname:filename>6 C New user login <nick><pwd><port><client info><speed>
<email address>100 C Client notification
of shared file“<filename>”<md5><size><bitrate><frequency><time>
200 C Search request [FILENAME CONTAINS “artist name”]MAX_RESULTS <max> [FILENAME CONTAINS<song] [LINESPEED <comp> <link type>][BITRATE <comp> “bit rate”] [FREQ <comp>“freq”] [WMA-FILE] [LOCAL_ONLY]
201 S Search response “<filename>”<md5><size><bit rate><frequency><length><nick><ip address>
202 S End of searchresponse
(empty)
November 11 - 15ISADS 2002, Guadalajara, Mexico 23
Sample MessagesSample Messages -- 22
Type C/S Description Format203 C Download request <nick> “<filename>”204 S Download ack <nick><ip><port> “<filename>” <md5>
<linespeed>206 S Peer to download not
available<nick> “<filename>”
209 S Hotlist user signed on <user><speed>211 C Browse a user’s files <nick>212 S Browse response <nick> “<filename>”<md5><size>
<bit rate><frequency><time>213 S End of browse list <nick>[<ip address>]500 C Push file to me
(firewall problem)<nick> “<filename>”
501 S Push ack (to otherclient)
<nick><ip address><port> “<filename>”<md5><speed>
November 11 - 15ISADS 2002, Guadalajara, Mexico 24
NapsterNapster: Summary: Summary
! Centralized system with direct user interaction– Single access point : sensitive to failure– Limited scalability (central database)
! The protocol is complex, incoherent andproprietary
! The search for files is fast thanks to powerful(multiple) servers (fast database)
! The topology is known => copyright, …! The participants’ reputation is not considered (are
the mp3 files of good quality? Do they containmusic?)
November 11 - 15ISADS 2002, Guadalajara, Mexico 25
Case study :Case study : GnutellaGnutella
November 11 - 15ISADS 2002, Guadalajara, Mexico 26
GnutellaGnutella: a brief history: a brief history
! Developed in a 14 days “quick hack” by Nullsoft(Winamp)
! Objective: exchange of cooking recipes! Chronology:
– Published under «GNU General Public License» on theNullsoft Webserver
– Taken off after a couple of hours later by Nullsoft’s ownerAOL-Time Warner
– This was enough to “infect” the Internet– The Gnutella protocol was reverse- engineered from
versions downloaded from the Nullsoft site– New Gnutella servents were quickly developed and
started rapidly to spread (even more so after theshutdown of Napster)
November 11 - 15ISADS 2002, Guadalajara, Mexico 27
TheThe GnutellaGnutella networknetwork
! No central server! Gnutella program = server and client (“servent”)! Flooding: requests are sent to neighbors (often 4
or 5) which pass them on to their neighbors, andso on; live-time limited by a TTL (typically 7).
! For each request, servents search for local filescorresponding to the request
! File transfer out of band with HTTP! Joining the system: at least one Gnutella host
must be known (e.g. gnutellahosts.com:6346)! Meeting : TCP message : GNUTELLA CONNECT/x.x\n\n
GNUTELLA OK\n\n
November 11 - 15ISADS 2002, Guadalajara, Mexico 28
Protocol : Relative anonymityProtocol : Relative anonymity
! Requests do not contain identity! Each servent only reliably knows about the
servents it is directly connected to! The direct HTTP file download reveals the identity
?
?? ?
?
?
???
?
November 11 - 15ISADS 2002, Guadalajara, Mexico 29
InformationDescriptionType
servent ID, index ofrequested file, host:portaddress
File download request forservents behind firewalls
Push
host:port address, hostbandwidth, number andtotal size of local files
reply to a PingPong
nonea request for host addressesPing
file description,host:port address, hostbandwidth, servent ID
answer from hosts havingmatching files
Reply(QueryHit)
speed, keywordsuser search requestQuery
Message typesMessage types
November 11 - 15ISADS 2002, Guadalajara, Mexico 30
Packet headerPacket header
! MessageID: 16 byte unique id! FunctionID: packet type! RemainingTTL: number of times the packet
will be forwarded! HopsTaken: TTL(0) = TTL(i) + Hops(i)! DataLength: the length of the remaining data
of this packet
Byte offset 0 16 17 18 19 22
Message ID Function ID TTL Hops DataLength
November 11 - 15ISADS 2002, Guadalajara, Mexico 31
Ping/Pong mechanismPing/Pong mechanism
! A host receiving a ping should :– answer with a Pong containing its own address– forward it to its neighbors
! In practice :– some hosts do not respond : reduce overhead– some hosts do not propagate : reply with a small number
of addresses– well-known Gnutella sites : serve as directories, not as
active host (e.g. gnutellahosts.com)– Recently: introduction of “ultra-peers”
November 11 - 15ISADS 2002, Guadalajara, Mexico 32
Axx
ABCDE
Ping: S
Pong: A
Pong: B
Pong: C
Pong: D
Pong: E
F: no response
New peers:
S
C
E
D
A
B
F
Peer meeting with Ping/PongPeer meeting with Ping/Pong
November 11 - 15ISADS 2002, Guadalajara, Mexico 33
Ping/Pong messagesPing/Pong messages
! Ping (0x00) : packet header with payload 0x00! Pong (0x01): packet header with payload :
Port IP address Nb shrd files KB shrd files
Byte 0 1 6 10 13
• Port where the responding host accepts connections• IP address of the responding host• Number of shared files• Number of Kbytes shared
November 11 - 15ISADS 2002, Guadalajara, Mexico 34
Query xx.mp3
Query hit : B
Query hit : C
Query hit : D
S
C
E
D
A
B
F
RequestsRequests:: Query/Query/QueryHitQueryHit/GET/GET
HTTPGET xx.mp3
xx.mp3
xx.mp3
xx.mp3
xx.mp3
November 11 - 15ISADS 2002, Guadalajara, Mexico 35
QueryQuery
! Query (0x80):
– Minimum speed: minimal bandwidth (in kbps) ofresponding servents
– Search criteria: character sequence ending with zero(0x00); maximal length is limited by the “Payload length”in the header
Search criteriaMinimum speed
Byte offset 0 1 2 ....
November 11 - 15ISADS 2002, Guadalajara, Mexico 36
PortNumberof hits
Byte offset 0 1 2 3 6 7 10 11 .... n n+16
IP address Speed Result
setServent
identifier
File index File size File nameByte offset 0 3 4 7 8 ......
QueryHitQueryHit
QueryHit (0x81)
– Number of hits: answer– Port: where the responding node accepts a connection– IP address: address of the responding node– Speed: bandwidth of responding node– Servent identifier: 16-byte word identifying the responding node
(unique)– Result set : results (number of hits entries)
• File index: a unique number assigned to a file by responding node• File size: file size (in bytes)• File name: file name ending with two zeros (0x0000)
November 11 - 15ISADS 2002, Guadalajara, Mexico 37
File DownloadFile Download
! «Out of band» with simplified HTTP! Connection to the IP address given by QueryHit! Example: 2369 5326983 xx.mp3\0x00\0x00
File index File size File name
GET /get/2369/xx.mp3/ HTTP/1.0\r\nConnection: Keep-Alive\r\nRange: bytes=0\r\nUser-Agent: Gnutella\r\n\r\n
HTTP 200 OK\r\nServer: Gnutella\r\nContent-type: application/binary\r\nContent-length: 5326983\r\n\r\n<data> ...
November 11 - 15ISADS 2002, Guadalajara, Mexico 38
Byte offset 0 15 16 19 20 23 24 25
Serventidentifier
File index
IP address Port
Handling Firewalls: The Push DescriptorHandling Firewalls: The Push Descriptor
! If a host cannot be contacted directly (firewall)! The servent receiving a Push descriptor (0x40)
initiates the file transfer (with outgoing connection)
– Servent identifier: 16-byte string uniquely identifying theservent which is requested to push the file
– File index: unique identifier of the file to be pushed– IP address: of the host to which the file should be pushed– Port: to which the file should be pushed
November 11 - 15ISADS 2002, Guadalajara, Mexico 39
! Servent A receives a QueryHit of servent B which is behind afirewall and accepts connections only on the Gnutella port
! A sends Push (0x40) to B
! B opens a connection to the IP/port given in the Push message andsends:
! When A receives GIV, it starts a regular download on theconnection:
! Does not work if both parties are behind firewalls
GnutellaGnutella Push :Push : serventservent behind firewallbehind firewall
GIV <File index>:<Servent identifier>/<File name>\n\n
GET /get/<File index>/<File name>/ HTTP/1.0\r\nConnection: Keep-Alive\r\nRange: bytes=0\r\nUser-Agent: Gnutella\r\n\r\n
Servent identifier File index IP address Port
November 11 - 15ISADS 2002, Guadalajara, Mexico 40
FreeFree--ridingriding :: downloadsdownloads
[Adar, Huberman]
Many servents share files nobody downloads :Of 11,585 sharing hosts:
– Top 1% of sites provide nearly 47% of all answers– Top 25% of sites provide 98% of all answers– 7,349 (63%) never provide a query response
November 11 - 15ISADS 2002, Guadalajara, Mexico 41
FreeFree--riding : number of shared filesriding : number of shared files
Most Gnutella users are free riders[Adar00]
November 11 - 15ISADS 2002, Guadalajara, Mexico 42
FreeFree--ridingriding
Of 33,335 hosts:
–22,084 (66%) of the peers share no files
–24,347 (73%) share ten or less files
99%3,082,5728,333 hosts (25%)
98%3,037,2326,667 hosts (20%)
94%2,928,9055,000 hosts (15%)
87%2,692,0823,334 hosts (10%)
70%2,182,0871,667 hosts (5%)
37%1,142,645333 hosts (1%)
As % of the wholeShareThe top
November 11 - 15ISADS 2002, Guadalajara, Mexico 43
StatisticsStatistics
! The top 1 percent of those queries accounted for37% of the total queries on the Gnutella network.The top 25 percent account for over 75% of thetotal queries.
! The top responding host only hosted 695 files, butresponded to 3,436 queries. The next mostresponsive peer hosted 956 files and responded to1,474 queries.
! There is only low correlation between quantity andquality of files shared
November 11 - 15ISADS 2002, Guadalajara, Mexico 44
Popularity of QueriesPopularity of Queries
! Very popular documents are approximately equally popular! Less popular documents follow a Zipf-like distribution (i.e., the
probability of seeing a query for the ith most popular query isproportional to 1/(ialpha)
! Access frequency of Web documents also follows Zipf-likedistributions ⇒⇒⇒⇒ caching might also work for Gnutella
[Sripanidkulchai01][Sripanidkulchai01]
November 11 - 15ISADS 2002, Guadalajara, Mexico 45
Online communitiesOnline communities ––PeerPeer--toto--Peer systems (P2P)Peer systems (P2P)
The Gnutella network as an experimentation platformfor evaluating whether such P2P systems areappropriate for online communities.
Ref.: J. Vaucher, G. Babin, P. Kropf, Th. Jouve: Experimenting withGnutella Communities, Distributed Communities on the Web,Sydney, April 2002, LNCS 2468, Springer Berlin, pp. 85-99
November 11 - 15ISADS 2002, Guadalajara, Mexico 46
! Working large scale P2P data-sharing system! Highly dynamic, peers are autonomous, self-
organized! Completely decentralized: no global index, no
central coordination. Global behavior emergesfrom local interactions
! Simple communication mechanism : based on“flooding”, direct contact between requesting andproviding nodes
! Simple, robust and sufficiently scalable! Much OpenSource software
WhyWhy GnutellaGnutella ??
November 11 - 15ISADS 2002, Guadalajara, Mexico 47
! Small world property: find everythingclose by [Milgram 67] [Watts 99]
! High clustering and short pathlengths observed
! Network topology unknown! Much overhead traffic for
maintaining connectivity! Broadcast strategy not appropriate
for small world networks
Which are the “good” peers andhow can they be determined ?
[Jovanovic01]
GnutellaGnutella NetworksNetworks
November 11 - 15ISADS 2002, Guadalajara, Mexico 48
Experimentation GoalsExperimentation Goals
! Quality of network :– good connections– stability– routing properties
! No consideration of file sharing and downloads:Free Riding on Gnutella well known [Adar00]:
– 70% of hosts share no files– 50% of responses by 1% of sharing nodes
November 11 - 15ISADS 2002, Guadalajara, Mexico 49
! Based on JTella :– author : Ken McCrary– written in Java– Java API to access the Gnutella network– 40 classes, 7000 lines of code (LOC)– easy to use: a simple application to monitor traffic
requires 150 LOC
Experimental platformExperimental platform
November 11 - 15ISADS 2002, Guadalajara, Mexico 50
Experimental platform : architectureExperimental platform : architecture
Implementation based on Jtella (K. McCrary)
November 11 - 15ISADS 2002, Guadalajara, Mexico 51
! HostCache :– contains only valid Pong addresses– limited to 200 addresses
! Statistics on connections :– connection type, start time, set-up time, duration,
number of messages received, termination code
! The performance observed heavily depends onchance
! Two exploration experiments show– the difficulty to maintain connectivity– the difficulty to quantify the behavior
Experimentation: measuring performanceExperimentation: measuring performance
November 11 - 15ISADS 2002, Guadalajara, Mexico 52
! Exploration experiment 1:– passive node : routes messages, collects statistics and
print status every 15 sec.
Measurements (1)Measurements (1)
November 11 - 15ISADS 2002, Guadalajara, Mexico 53
! Exploration experiment 2 :– two servents run in parallel– keep 4 connections open– monitored parameters :
• number of messages received• the horizon is a measure for the network size:
– a ping is broadcast every minute and we count the answeringPongs over the next minute
Measurements (2)Measurements (2)
November 11 - 15ISADS 2002, Guadalajara, Mexico 54
! 2 sessions in parallel for 45 min. :
Average message rate :- A : 120 msg/sec- B : 180 msg/sec
MeasurementsMeasurements
Input rate/time Horizon/time
Stochastic behavior: it isdifficult to estimate thenetwork size.
November 11 - 15ISADS 2002, Guadalajara, Mexico 55
! Validity of addresses received with Pongmessages:– many duplicate addresses :
• around 75 % of addresses received are identical toaddresses already in the cache (size : 200)
• around 25 % of addresses are identical to the onesreceived just before
– many “special” (invalid) addresses :• 0.0.0.0, 127.0.0.1, 255.255.255.255• multicast addresses, private addresses (NAT)
Validity of Pong information (1)Validity of Pong information (1)
November 11 - 15ISADS 2002, Guadalajara, Mexico 56
Validity of Pong information (2)Validity of Pong information (2)
! Analysis of received addresses:– 16 % to 28 % of addresses retained– limited size of cache: not all duplicates can be detected– unique valid addresses received: 9 to 20 % (obtained from
post-mortem analysis)
! Modification of cache algorithm : filtering invalid andrepeated addresses– experiment yielded around 30% valid addresses received– this is still roughly 4 times more addresses than used to
maintain connectivity
November 11 - 15ISADS 2002, Guadalajara, Mexico 57
Connection setup (1)Connection setup (1)
! Socket creation : probability of success toconnect to another host :– connection attempt fails :
• host too busy• application terminated, computer disconnected
– logical timeout for socket creation : 10 sec.(in Java 1.3 & Linux, actual timeout = 13 min.)
Session : 90 min., 2541 attempts of connection :
- 31% : success, connection achieved in 2.3 sec. ;
- 20% : failure reported rapidly in 1.7 sec. ;
- 49% : blocking, failure noted after 10 sec.
November 11 - 15ISADS 2002, Guadalajara, Mexico 58
Connection setup (2)Connection setup (2)
Test of 100 random addresses taken from previous sessions
- 36% : success, connection achieved in 1.6 sec. ;
- 20% : failure reported rapidly in 0.9 sec. ;
- 38% : blocking, failure noted after 10 sec.
! Most connections fail! Connections to hosts with previously successful
connections randomly chosen are not morereliable
November 11 - 15ISADS 2002, Guadalajara, Mexico 59
Distribution of message typesDistribution of message types
! Distribution of message types :– proportion of Pings and Replies remain constant– proportion of Pongs and Queries varies widely– substantial overhead is introduced by the continuous
Pings and Pongs
November 11 - 15ISADS 2002, Guadalajara, Mexico 60
Host quality (1)Host quality (1)
! “Good” hosts :– most participants are transients– definition of “good” hosts :
• “good” host = connection is maintained for at least 2 min.
– Dec 30th, 2001 : from 41789 connections over 24 hours,564 (1.3%) were considered “good”
– can “ good” hosts be reused for future sessions?
November 11 - 15ISADS 2002, Guadalajara, Mexico 61
! “Good” hosts :– try to reconnect to “good” hosts (without handshake):
Possibility to identify reliable semi-permanenthost
Host quality (2)Host quality (2)
Success rate after addresses were obtained :
- 24 h : 18 %, 48h : 10 %, 7 days : 7 %- 67 % of cases : unable to reconnect a single time- 10 % of cases : hosts were available 50 % of time- 0.7 % of cases : hosts were always responding
90 % of tested hosts can open Gnutella sessions (with handshake)
November 11 - 15ISADS 2002, Guadalajara, Mexico 62
LatencyLatency
! Latency :– measure : broadcast Ping every minute and collect all
responding Pongs– number of responding hosts : 5 to 252– hop counts : minimum is often 2 or 3
Delay measures :
- min : 0.9 sec. ; max : 128 sec. ; mean : 13 sec.
- mean hops : 5
- mean delay between host : 1.3 sec.
November 11 - 15ISADS 2002, Guadalajara, Mexico 63
! Goal : improving performance– need to evaluate effects of various servent parameters or
strategies :– need of a methodology that does
• not disrupt normal operation of the network• reduce effects of randomness on performance indicators• enable comparison of different versions of servents
Measuring performanceMeasuring performance
November 11 - 15ISADS 2002, Guadalajara, Mexico 64
! Run an experiment for each parameter or strategyto test
! Experimentation: lasts 24 hours, 24 runs of 45minutes
! Each run = 2 servents in parallel : a “benchmark”servent and a “test” servent
! Parameters of “benchmark” runs are constant;this serves as basis for comparison
! Measured indicators :– nbr of messages, nbr of Pings, nbr of Pongs, average
horizon, nbr of distinct host addresses found
Measuring methodology (1)Measuring methodology (1)
November 11 - 15ISADS 2002, Guadalajara, Mexico 65
! Statistics collected on indicators used to compute:
where :- m : number of indicators used ;- xij
s : value of indicator i at the jth run of servent s, where s = b for thebenchmark servent and s=t for the test servent
Measuring methodology (2)Measuring methodology (2)
performance ratio r
November 11 - 15ISADS 2002, Guadalajara, Mexico 66
! Evaluation of methodology– Influence of the number of connections on the
performance of a servent :
Performance measurementsPerformance measurements
November 11 - 15ISADS 2002, Guadalajara, Mexico 67
! The performance observed heavily depends on chance:– Difficulty to quantify behavior– Difficulty to maintain connectivity
! Network traffic overhead produced by flooding is large (50%from ping/pong)
! Majority of addresses received were invalid or redundant :filter is necessary
! 1/3 of outgoing attempts lead to valid connections, butaverage connection duration only 30 sec.
! Possibility to identify (rare) semi-reliable hosts! Average delay of ping answer : 15 sec.! Difficult to estimate size of the network :
– 2/3 of “good” hosts do not respond to Pings
Conclusions ofConclusions of GnutellaGnutella Experiments (1)Experiments (1)
November 11 - 15ISADS 2002, Guadalajara, Mexico 68
Conclusions ofConclusions of GnutellaGnutella Experiments (2)Experiments (2)
! The use of this type of (overlay) network is still limited fordeveloping online community applications.
! Local intelligence on community is necessary for efficiencyand scalability
! The highly volatile character of the network calls for efficientmethods for self-organization
! Future work :– Development of more sophisticated strategies and policies to
maintain connectivity and to (self-) organize the network– Evaluation of new strategies with practical tests– Simulations to evaluate the effect of different strategies and
parameters
November 11 - 15ISADS 2002, Guadalajara, Mexico 69
GnutellaGnutella: Bandwidth Barriers: Bandwidth Barriers(experiments by Clip2)(experiments by Clip2)
! Measuring Gnutella over 1 month:– typical query is 560 bits long (including TCP/IP headers)– 25% of the traffic are queries, 50% pings, 25% other– on average each peer seems to have 3 other peers actively
connected
! Scalability barrier with substantial performancedegradation if queries/sec > 10:
10 queries/sec* 560 bits/query* 4 (to account for the other 3 quarters of message traffic)* 3 simultaneous connections67,200 bps⇒ 10 queries/sec maximum in the presence of many dialup users⇒ more bandwidth will induce larger files: no improvement)
November 11 - 15ISADS 2002, Guadalajara, Mexico 70
GnutellaGnutella: Summary: Summary
! Completely decentralized! High fault tolerance but unstable! Adopts well and dynamically to changing peer populations! Protocol causes high network traffic. For example:
– 4 connections C / peer, TTL = 7– 1 ping packet can cause packets– 60% of today’s traffic on commercial Internet stems from Gnutella like
applications
! No estimates on the duration of queries can be given! No probability for successful queries can be given! Topology is unknown ⇒⇒⇒⇒ search algorithms cannot exploit it! Reputation of peers (trust) is not addressed! Simple, robust, and scalable (at the moment ??)
240,26)1(**20
=−∑ =
TTL
i
iCC
November 11 - 15ISADS 2002, Guadalajara, Mexico 71
SystemSystem ArchitecturesArchitectures
timeHierarchy
Asymmetry
Self-organizationSymmetry
CCCC
M
Music Industry
AA
CCCC
M
MP3
AA
C
C
C
gnutella
A
CCC
CM
Napster
A
A
unidirectional
interactive
Unidirectional/bilateral
multilateral
structure
con
tro
l
Centralization Decentralization
November 11 - 15ISADS 2002, Guadalajara, Mexico 72
Other Systems and ApproachesOther Systems and Approaches
! Freenet! JXTA! Mobile Ad-hoc Networks (MANET)! Grid Computing! Ubiquitous Computing! Agents! AWeb tool! Web Operating System (WOS)
November 11 - 15ISADS 2002, Guadalajara, Mexico 73
FreenetFreenet: System Architecture: System Architecture
! Adaptive P2P system which supports publication,replication, and retrieval of data
! Protects anonymity of authors and readers– infeasible to determine the origin or destination of data– difficult for a node to determine what it stores (files are sent
and stored encrypted)⇒nobody can be sued
! Requests are routed to the most likely physicallocation– no central server as in Napster– no constrained broadcast as in Gnutella
! Files are referred to in a location independent way! Dynamic replication of data
November 11 - 15ISADS 2002, Guadalajara, Mexico 74
FreenetFreenet: Searching: Searching [Hong01][Hong01]
! Graph structure actively evolves over time– new links form between nodes– files migrate through the network⇒ adaptive routing
[Abere02]
November 11 - 15ISADS 2002, Guadalajara, Mexico 75
FreenetFreenet: Summary: Summary
! Completely decentralized! High fault tolerance! Robust and scalable! Automatic replication of content! Adopts well and dynamically to changing peer populations! Spam content less of a problem (subspaces)! Adaptive routing preserves network bandwidth! No estimates on the duration of queries can be given! No probability for successful queries can be given! Topology is unknown ⇒⇒⇒⇒ algorithms cannot exploit it! Routing “circumvents” free-riders! Reputation of peers is not addressed! Supports anonymity of publishers and readers
November 11 - 15ISADS 2002, Guadalajara, Mexico 76
Project JXTA (SUN)Project JXTA (SUN)
! A network programming platform for P2P systems– 3-layer architecture– 6 XML-based protocols: discovery, membership, routing, ...– abstractions: peer groups, pipes, advertisements, ...
! Goal: a uniform platform for applications using P2P technology and forvarious P2P systems to interact
JXTA community applicationsPeershell
Peercommands
Sun JXTAapplications
JXTA community servicesSunJXTAservices
•Indexing•Searching•File sharing
Security
Peer groups Peer pipes Peer monitoring
Any peer on the extended Web
JXTAapplications
JXTAservices
JXTAcore
November 11 - 15ISADS 2002, Guadalajara, Mexico 77
Mobile AdMobile Ad--hoc Networks (hoc Networks (MANETsMANETs))[Hubaux01][Hubaux01]
Dynamic, location -sensitive data withrestricted access
? Mobile users with dynamicindividual interest profiles
Example:Distributed Virtual Database fore-business
Localinformationexchange
Applications: mobile information commercelocation-dependent information services
November 11 - 15ISADS 2002, Guadalajara, Mexico 78
Grid ComputingGrid Computing
! Grid computing: A set of connected Super Computers whichallows the transparent execution of a program by searchingand allocating the resources it requires.
! Middleware : transforms a collection of resources into asingle, coherent and virtual resource. Often based on MPIand PVM.
! Examples:– Globus (www.globus.org) : USC, ANL– NetSolve (www.cs.utk.edu/netsolve) : Univ. of Tennessee, ORNL.– Legion : Univ. of Verginia– Milan : New York Univ., Arisona State Univ.– UNICORE : Genias, Pallas, German Universities– SUN GridEngine– Seti@home (loosely coupled client-server cluster system)
November 11 - 15ISADS 2002, Guadalajara, Mexico 79
Example:Example: setiseti@home@home
SETI-Central
Peer
Peer
Peer
Peer
Task
• Loosely coupled client-server cluster system:« embarrassingly parallel application »
• Client computer downloads set of radio-data; analysis(Fourier) during idle times; results sent back to server
• Goal: look for signals of artificial origin
• 3.6 million clients (March 2003); 1.42E+21 floating pointoperations performed; 921,190 CPU years so far
November 11 - 15ISADS 2002, Guadalajara, Mexico 80
Ubiquitous ComputingUbiquitous Computing
! The disappearing computer: from fixed to mobile to wearable! It is about the Computer in the World and NOT the World in
the Computer : bridging the gap between virtual and realworld
! Context- and location-aware, diverse and numerous, human-centric
! Much technology driven: Moore’s law! Smart devices with spontaneous network capabilities that
have access to any information or provide access to anyservice “on the net”
! Vision: everyday objects become smart and interconnected;they communicate and cooperate : communities
November 11 - 15ISADS 2002, Guadalajara, Mexico 81
Ubiquitous ComputingUbiquitous Computing
November 11 - 15ISADS 2002, Guadalajara, Mexico 82
AgentsAgents -- CommuntiesCommunties
! Agent characteristics– Autonomy: capability to pursue its goals without interactions
or commands from the environment– Social ability: capability to interact with the environment;
context/situation awareness– Reactivity: capability of reacting appropriately to influences or
information from its environment– Proactivity: capability to take the initiative under specific
circumstances– Mobility: the ability to move around in an electronic network
(Security? Trust? Ontology?)– Collaboration: the ability to collaborate and coordinate actions
and behavior– Learning: the ability to learn from past situations and actions
Agent systems may be perceived as communities
Examples: Agentcities (www.agentcities.org), ACAN (Univ. Ottawa), and many more ….
November 11 - 15ISADS 2002, Guadalajara, Mexico 83
Medium
Agent
Communication
Medium
Information Object
MediumMedium –– A Sphere for AgentsA Sphere for AgentsCommunities = Agents + MediaCommunities = Agents + Media
[Schmid01]
“The medium is the message” [Wadge02]
November 11 - 15ISADS 2002, Guadalajara, Mexico 84
AWebAWeb tool (1)tool (1)
Herwig Unger, Markus Wulff, University of Rostock
November 11 - 15ISADS 2002, Guadalajara, Mexico 85
AWeb toolAWeb tool (2)(2)
November 11 - 15ISADS 2002, Guadalajara, Mexico 86
AWebAWeb Tool (3)Tool (3)
November 11 - 15ISADS 2002, Guadalajara, Mexico 87
AWebAWeb Tool (4)Tool (4)
November 11 - 15ISADS 2002, Guadalajara, Mexico 88
The Web Operating System (WOS)The Web Operating System (WOS)
The Web Operating System (WOSTM) concept was developedseveral years ago in response to the emergence of globallynetworked systems embracing computers, appliances andcommunication devices. It follows the vision of openenvironments characterized by having components that areautonomous, numerous, heterogeneous, context-aware andadaptable.
These components do not stand alone, but communicate,negotiate and collaborate to serve an individual’s or group'sgoals. This gives raise to evolving associations or communitiesof participants, be they physical devices, software objects orpeople mediated by a shared context.Project partners: Peter Kropf (University of Montreal), Gilbert Babin (HEC Montreal), HerwigUnger (University of Rostock), John Plaice (University of New South Wales)
November 11 - 15ISADS 2002, Guadalajara, Mexico 89
RationaleRationale for WOSfor WOS
WOSMiddleware/Services
Self-configuring networks of mobile and stationarydevices form communities
November 11 - 15ISADS 2002, Guadalajara, Mexico 90
RationaleRationale for WOSfor WOS
! WOS aims to :– Supply users with adequate tools that allow for the
implementation of specific services– Provide users with great flexibility in the “semantic” of
their services.
! Services are volatile:– Services appear, disappear, evolve, etc.– Services and environments have to adapt to their
contexts.
! WOS Vision :– Any-time, any-where, any-service.
November 11 - 15ISADS 2002, Guadalajara, Mexico 91
WOSRPWOSP
TheThe Nature of WOSNature of WOSInfrastructure of WOSInfrastructure of WOS
WOSnode
WOSnode
WOSnode
Netw
orkN
etwork
Softw
areR
esourcesW
arehouse
Har
dwar
eR
esou
rces
War
ehou
se
Information Warehouse
Request Warehouse
Search Evaluation
API UI
Local O/S
November 11 - 15ISADS 2002, Guadalajara, Mexico 92
TheThe Nature of WOSNature of WOSCommunicationCommunication FrameworkFramework
! The WOS Protocol (WOSP)– Simple generic syntax– Extensible– A version of WOSP is a specialization of the generic
syntax
! The WOS Request Protocol (WOSRP)– Search/Localization of WOS nodes– Exchange of information about WOSP version– Establishment of WOSP connections
November 11 - 15ISADS 2002, Guadalajara, Mexico 93
TheThe Nature of WOSNature of WOSService classes of WOSService classes of WOS
! A WOS node provides a set of service classes– A service class is a set of services of same nature– Example : Configuring and executing High Performance (HP)
applications– A user query is served by a specific service belonging to a
specific service class
! All service classes respect the generic WOS Protocol(WOSP)– Every service class has its own semantic– We define a specific version of WOSP for each service class– A service class is a WOSP version
! Services are dynamic and must be able to evolve
November 11 - 15ISADS 2002, Guadalajara, Mexico 94
TheThe Nature of WOSNature of WOSInformationInformation FrameworkFramework
! Resource Warehouses– Contain the description of the local resources provided by the
WOS node– Preserve information about remote resources already requested
! What is resource ?– Hardware : CPU, Memory, etc.– Software : Java, PVM, MPI, WOSP version, etc.– Others : Effective CPU or Network performance, etc.
! Warehouses are distributed and have limited storagecapacity– They learn and forget information
November 11 - 15ISADS 2002, Guadalajara, Mexico 95
TheThe Nature of WOSNature of WOSInformationInformation FrameworkFramework
! A resource is describedby attributes: an attributeand its associated valueis called an av-pair
! The structure of awarehouse is ahierarchical arrangementof av-pairs: An av-pair is adescendent of another av-pair when it depends on it
November 11 - 15ISADS 2002, Guadalajara, Mexico 96
UserUser RequestsRequests
! Users make requests to identify services that fulfilltheir needs
! Requests are built using arrangements ofattributes and values, linked by relationaloperators: av-relations– For example
• [Attribute:Price ≤≤≤≤ Value: 10 dollars]
! A request consists of two predicates– Pu : user-specific characteristics of the service– Pc : context-specific characteristics of the service
November 11 - 15ISADS 2002, Guadalajara, Mexico 97
ProcessingProcessing a Usera User RequestRequest::TheThe SearchSearch AlgorithmAlgorithm
! Search (Pu, Pc, q) : tries to return at least q servicesmatching Pu and Pc
1. Performs a local request : Request (Pu, Pc, q)– If the number of services found is sufficient, the algorithm stops
2. Performs a remote request– Finds nodes with the same context from local warehouses– Performs the request on each of these remote nodes– If the number of services found is sufficient, the algorithm stops
3. Performs a remote request (2nd try)– Finds any node from known remote nodes– Performs the request on each of these remote nodes– If the number of services found is sufficient, the algorithm stops
4. Find at least one other WOS node– Uses a Bootstrap algorithm– Restarts at step 1
November 11 - 15ISADS 2002, Guadalajara, Mexico 98
PerformingPerforming a Locala Local RequestRequest::TheThe RequestRequest AlgorithmAlgorithm
! Request (Pu, Pc, q) : tries to return at least qservices matching Pu and Pc from the localwarehouses
– The best case• We find at least q services matching Pu and Pc
– The intermediate case• We gradually reduce the constraints imposed by Pc until we
find at least q services
– The worst case• We find all the services only matching Pu
• The set of services found may be empty
November 11 - 15ISADS 2002, Guadalajara, Mexico 99
Notion of Best FitNotion of Best Fit
! Best Fit means the services which match the user requestand most or all of the contextual parameters
! Example– Pu = [Attribute: service = value: printing]– Pc = [Attribute: building = value: HEC]
– The search algorithm will try to choose all printers located inthe HEC building
• [value: HEC]
– Otherwise, the search algorithm would provide the user withprinting services in other buildings
November 11 - 15ISADS 2002, Guadalajara, Mexico 100
Managing WarehouseManaging Warehouse ContentContent
! WOS nodes have limited storage capacities– Warehouses cannot grow indefinitely– Mechanisms must be put in place to control the size of
warehouses
! Warehouses are updated whenever a WOS node receivesanswers from remote nodes– Decisions must be taken whether to insert, update, or remove
information.
! For each av-pair in a warehouse, we keep track of– Its creation date– Its last modification date– Its number of access
! This information enables a WOS node to properly managethe limited storage capacity allocated to it
November 11 - 15ISADS 2002, Guadalajara, Mexico 101
Applications of WOSApplications of WOS
! Virtual communities– Shared contexts– Adaptive, dynamic
management of federations
! Grid computing– Transparent remote execution– Transparent distributed file
system access
! E-Commerce– Discovery of business
partners– Transparent transmission of
business data
any-time, any-where, any-service, any-mediumThe WOS for
November 11 - 15ISADS 2002, Guadalajara, Mexico 102
PeerPeer--toto--peerpeer networknetworkOptimizing the QoSOptimizing the QoS in a WOSin a WOS communitycommunity
! Simplified virtual space (community) :– n nodes, c clients, s servers, c >> s– The same service offered by all servers
! Overlay network defined by neighborhoodrelationships :– Locally stored knowledge of a node and the
communication path to it (c-s; s-s)
! Optimization goal :– Reduce response time t to requests– Balance server load
! Metric for response time : t(c,s) = k*t + d/b(c,s)! The optimization process is initiated by any
unsatisfied client or an overloaded server
November 11 - 15ISADS 2002, Guadalajara, Mexico 103
Optimizing the QoSOptimizing the QoS in a WOSin a WOS communitycommunity
November 11 - 15ISADS 2002, Guadalajara, Mexico 104
Applications of WOS:Applications of WOS:Grid computingGrid computing :: HHPP--WOSPWOSP
! HP-WOSP is a class of services used to configureHigh Performance applications
! Services of HP-WOSP are :– Discovery Service : HP-WOSP (discovery, pgm)– Reservation Service : HP-WOSP (reservation, pgm)– Setup Service : HP-WOSP (setup, pgm)
HP-WOSP
HP-WOSP
User Request
WOSRP RequestWOSRP Reply
X-WOSP RequestX-WOSP Reply
HP-WOSP
November 11 - 15ISADS 2002, Guadalajara, Mexico 105
Usersending/receivingnews and email viaPDA or cell phone
WOScompliantnewsprovider
WOScompliantemailprovider
WOSNetWOS printer
Document printor e-paper
Delivery ofinformation/service
What next?What next? -- From connected society toFrom connected society tonetworked societynetworked society
November 11 - 15ISADS 2002, Guadalajara, Mexico 106
Open environments - DistributedOpen environments - Distributedcommunitiescommunities
! The “any-” vision calls for concepts to structurethe diversity
! Open environment – components:– Autonomous– Heterogeneous– Numerous (large scale)– Mobile and adaptive in space and time– Context aware– Collaborative– Dynamic membership (join/change/leave)
! Decentralized – Self-organized
Distributed Communities
November 11 - 15ISADS 2002, Guadalajara, Mexico 107
Distributed CommunitiesDistributed Communities
! Distributed communities : evolving associations ofparticipants (people, devices, software), mediatedby a shared context.
! Community : medium of collaborationtransforming the participants involved
! Not a space for random interaction betweenindividuals, but a structure for efficient interactionto acquire and disseminate information