introduction to peer-to-peer computingstaff.um.edu.mt/csta1/p2p.pdf · underlying network...

42
Introduction to Peer-to-Peer Computing Dr Kevin Vella Department of Computer Science University of Malta

Upload: others

Post on 05-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

Introduction to

Peer-to-Peer

ComputingDr Kevin Vella

Department of Computer Science

University of Malta

Page 2: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

2

Overview

� Introduction

� P2P Architecture

� Overlay Networks

� P2P Middleware

� P2P Applications

Page 3: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

3

Distributed Applications� A distributed application consists of multiple

software modules located on different computers

� It is possible that multiple users may use the

application concurrently on different computers

� A communication network is used for

synchronisation and communication between

the modules

� Issues

� Mapping software modules to computers

� Discovery of the other software modules

Page 4: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

4

Structuring Distributed Apps

� We will consider two approaches

�Client-Server architecture

�Peer-to-Peer architecture

� Hybrids are possible and indeed useful

Page 5: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

5

Client-Server Architecture� A client-server system contains two types of

software modules� Server module

� One centralised instance (but might be replicated internally for scaling purposes)

� Passively listens for connections from clients

� Multiple client requests may be handled� Sequentially

� Concurrently (multithreaded server)

� By several replicated servers at different locations

� Pending client requests may be queued up

� Servers are assumed to be reliable, often running in a data centre (dedicated/virtualised hardware)

Page 6: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

6

Client-Server Architecture� A client-server system contains two types of

software modules

� Client module

� Multiple distributed instances, possibly controlled by

different users

� Actively initiates a connection to a server

� No direct communication between clients

� Clients need to know the network address and port

number of a server (i.e. service discovery is typically

done through client configuration; however see

directory services such as UDDI)

� Clients may be unreliable without affecting overall

system stability

Page 7: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

7

Client-Server Architecture� Some examples of client-server systems

�Web server/web browsers

�Web server/client applications (web services)

�MS Exchange server/Outlook clients

�MS Terminal server/RDP clients

�SSH/Telnet/FTP server/clients

�NFS/SMB server/clients

�Mainframe/dumb terminals (hardware)

�Bank cashier/clients (human)

Page 8: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

8

P2P Architecture� A P2P system consists of many identical software

modules (peers) running on different computers

� Peers communicate directly with each other

� Each peer is a server as well as a client

� Provides services to other peers

� Requests services from other peers

� Peers tend to be unreliable, unlike dedicated servers

� Service discovery is more complicated since there are so many servers continuously appearing and disappearing at different network locations

� Natural scalability due to multiple servers

� Can work without allocating dedicated server machinery

Page 9: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

9

Communication Structure

Peer 1

Peer 3Peer 2

Client 1 Client 3

Client 2

Server

Direction of service request

Client-Server Peer-to-Peer

Page 10: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

10

Overview

� Introduction

� P2P Architecture

� Overlay Networks

� P2P Middleware

� P2P Applications

Page 11: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

11

Peer Architecture

� Some applications integrate all of the above (e.g. Gnutella, Bittorrent), though libraries exist that provide reusable P2P functionality (e.g. JXTA)

Base Overlay Layer

Middleware Layer

Application Layer

Underlying Network

Page 12: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

12

Base Overlay Layer� The base overlay layer is responsible for

� Discovering new peers

� Maintaining the P2P overlay (virtual) network

� Forwarding messages between peers

� The overlay network is a virtual network laid over the ‘physical’ network (e.g. TCP/IP)

� Overlay network ‘wires’ are implemented using underlying network facilities (e.g. TCP connections or UDP messages)

� Overlay network distance is measured in the number of hops from peer to peer

� Peers that are distant in the physical network may be neighbours in the overlay network, and vice-versa

� The performance of the P2P system is influenced by the structure of the overlay network

Page 13: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

13

Middleware Layer� The middleware layer provides access to the

services/resources provided by peers, and may

be responsible for functions such as

� Security: controlling access to services/ resources

� Service/resource discovery: searching and indexing

services/resources distributed across peers

� Peer groups: coordinating peers that provide or

consume a particular service/resource; may provide fault tolerance and persistent state

Page 14: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

14

Application Layer

� The middleware services can be used to build complete applications

�File sharing

�Routing protocols

�Newsgroups

� Instant messaging

�Distributed file systems

�Distributed backup systems

�Anonymous web browsing

�And more…

Page 15: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

15

Client-Server vs. P2P� Ease of development

� CS is more established and familiar than P2P, which is still in flux

� CS exhibits simple interaction patterns for clients and server, while P2P involves more complex interaction patterns between peers

� Manageability

� It is easier to maintain a centralised server in a CS environment than it is to keep track of and maintain several distributed peers in a P2P system

� Scalability

� CS scalability is limited by fixed server hardware, though scaling can be achieved through load balancing over multiple servers at increased cost

� P2P is scalable by nature, since as the number of peers grows, so does the ‘server’ capacity

Page 16: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

16

Client-Server vs. P2P� Administrative domains

� In CS, the server(s) typically fall under a single administrative domain

� In P2P, peers can easily belong to different administrative domains, hence facilitating collaboration

� Security� Responsibility for CS security lies with the server, which is

centrally hosted in a secure environment� Responsibility for P2P security is distributed across peers in

different administrative domains, some of which might be compromised

� Reliability� CS reliability is achieved through the use of multiple redundant

servers (possibly hosted at different locations) with automatic fail-over, at additional cost

� With P2P, resilience comes free of charge, since multiple peers are usually able to provide the same service in the case that some peers fail

Page 17: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

17

Overview

� Introduction

� P2P Architecture

� Overlay Networks

� P2P Middleware

� P2P Applications

Page 18: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

18

Overlay vs Underlying Network

Page 19: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

19

Overlay vs Underlying Network

Page 20: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

20

Overlay vs Underlying Network

Page 21: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

21

Overlay vs Underlying Network

Page 22: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

22

Overlays and Peer Discovery� A P2P network is typically a ‘virtual’ network

overlaid on an existing network (e.g. the Internet)

� A new peer needs to discover at least one existing peer in order to join a P2P network� Network location information: IP address, listening

port number, etc.

� If no peers are found are found immediately, the new peer either� Passively waits for new participants, or

� Proactively looks for potential new participants

� It is hard to locate existing peers in a large network such as the Internet

Page 23: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

23

Initial Peer Discovery� Static configuration

� Each peer is preconfigured with a list of the network

locations (IP address and/or port number) of every other peer in the system

� On startup (and possibly periodically) each peer attempts to connect to some other peers in its list,

some of which may be running

� Due to the manual configuration, this is only suitable for P2P networks with a small number of peers which

do not change frequently

� Can alternatively be used to initially contact a small

number of ‘well-known’ peers that are guaranteed to be online

Page 24: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

24

Initial Peer Discovery� Centralised directory

� Each peer is preconfigured with the network location

of a centralised server

� Each peer contacts the server on startup (and

possibly periodically) to

� obtain a updated list of currently active peers

� Indicate to the server that it is active

� Most subsequent communications bypass the server,

using the P2P overlay network to route messages

instead

� Occasionally other services are provided by the server

(e.g. Napster’s server(s) also maintained a list of files

hosted by each peer)

Page 25: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

25

Initial Peer Discovery� Centralised directory

� Peers may go offline� Cleanly, in which case the peer’s shutdown procedure would contact

the server to remove it from the active peer list� Without warning (crash, network or power failure), thus rendering the

server’s active peer list obsolete (use active peer list item expiry and periodic server updating to mitigate effects)

� A peer only needs to connect to a few peers on the overlay network, and as such it only requires a handful of the entries in the active peer list to be valid

� Centralised directory server is a single point of failure

� One directory server usually handles a large but limited number of peers

� To ponder: is DNS a suitable solution?

Page 26: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

26

Member Propagation Techniques� After discovering just one existing peer, information

about the rest of the P2P network can be obtained from it

� If each peer maintains a full member list then it is easy for any new peer to obtain a full member list from any other peer

� Alternatively, each peer can maintain a partial member list, replacing offline peers with new ones from neighbouring peers’ lists

� Since guaranteeing accuracy of full or partial lists across all peers involves excessive communication, a hint server (allowed to provide a possibly outdated list) may be used instead

� An expired entry will cause a connection failure, in which case replacement peers are obtained from other valid peers in the list

Page 27: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

27

Overlay Network Formation� Each peer maintains a (limited) number of links

with other peers (degree/valency)

� A peer can establish a link to another peer whose network address it has discovered

� The underlying network is used for transporting overlay messages� Point-to-point TCP sessions

� UDP datagrams or raw IP

� Ethernet packets� Bluetooth

� Problem scenarios� Firewall: solve by exchanging passive/active

connection establishment roles� Two firewalls: solve using proxy peer/tunnelling

Page 28: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

28

Overlay Network Topology� Intermediate peers in the overlay network forward

messages between indirectly connected peers

� The overlay topology significantly affects P2P system performance

� Diameter: longest distance between any two peers (overlay hops or latency)

� Average Degree: average number of links per peer (high AD increases message load but improves fault tolerance)

� Need to avoid linear formations and splits in the mesh

� Common topologies

� Random Mesh

� Tiered

� Ordered Lattice

Page 29: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

29

Random Mesh� Each peer discovers a number of other peers

and attempts to connect to them indiscriminately

� This (hopefully) results in a random structure with uniform degree

� Distant peers on underlying network could be overlay neighbours� Solution: connect to peers with lowest latency

� Random mesh is suitable for linking a large number of peers with uniform resources and connectivity

� Search message flooding can easily be used to discover resources/services on other peers but generates a lot of traffic

Page 30: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

30

Tiered Structure� Peers are ordered into tiers of a tree depending on their

advertised resources and connectivity (e.g. Kazaa’snodes and supernodes, 2-tier)� Tier 0 is the foundation tier containing (possibly well-

known) reliable peers with adequate resources and message forwarding capacity

� At each tier, every peer is linked to a number of peers of a lower tier and forwards messages up and down

� Poorly-resourced leaf peers only link to their ‘super-peer’and do not forward other peers’ messages; they are omitted from peer discovery

� The system needs to recover from peers leaving abruptly and disrupting the tree structure

� The hierarchy may be optimised to follow the underlying network’s structure (e.g. P2P video streaming)

Page 31: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

31

Ordered Lattice� In a two dimensional lattice, peers organise

themselves in a rectangular grid� 4 links per peer (except edge peers)

� Can extend to n dimensions

� Peers on opposite edges can also link to form a torus

� Messages are routed along lattice axes

� Peer additions and deletions must be handled on the fly, possibly distorting the structure

� Peer coordinates in a multi-dimensional lattice used as a key to locate resources in content addressable networks (see also distributed hash tables)

Page 32: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

32

Overview

� Introduction

� P2P Architecture

� Overlay Networks

� P2P Middleware

� P2P Applications

Page 33: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

33

Resources and Services� Resources: CPU, Storage, specialised external

equipment (I/O)

� Services: file sharing, RPC/WS, search/service discovery, directory, streaming, publish/subscribe

� P2P middleware or applications on a peer may share local resources or provide a service to other peers

� All resources are ultimately exposed through services

� Everything is a service

� In order to satisfy a service request a peer may invoke services from even more peers

� Services may be implemented using Web Service standards (SOAP, WSDL etc.)

Page 34: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

34

Service/Resource Discovery� A peer must advertise its services to enable their

discovery and subsequent use by other peers� In file sharing applications, the ‘service’ is a shared

file/block

� Service discovery is itself a service� Centralised (Napster, UDDI for web services)

� P2P (flooding, overlay multicast, CAN/DHT)

� When a search message reaches a matching advertisement on a peer, the server’s location is returned to the originator

� Infinite routing loops and duplicate messages can be avoided by message IDs or time-outs

� Actual service messages either routed through overlay or directly via underlying network by the application

� Optimise by caching advertisements/data (e.g. file/block) along search/return path on overlay

Page 35: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

35

Peer Groups� A session is a stateful conversation of service requests

between two or more peers

� e.g. web server identifies session through cookie on web browser

� Middleware or applications may support sessions in the form of collaborative peer groups

� e.g. peers hosting a file system or a virtual compute cluster

� Peer groups can be used to maintain a persistent distributed shared state between peers providing and/or consuming a service

� Stateful web services enable grid services

� Recursive peer groups

� Can form their own overlay network

Page 36: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

36

Security

� Middleware or applications can control access to services

� Use PKI to protect services (e.g. use a file block’s own hash value to encrypt it; use digital signatures)

� Decentralised security challenge: P2P public key services, transitive trust

Page 37: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

37

Overview

� Introduction

� P2P Architecture

� Overlay Networks

� P2P Middleware

� P2P Applications

Page 38: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

38

P2P File Sharing� Sharing files vs sharing blocks

� Concurrent downloads� Different files

� Same file from different peers

� Caching names/data along search path

� Super-peers vs. pure P2P

� Ad-hoc overlay topologies, resource discovery by flooding

� Resource discovery: CS, P2P or manual

� Napster, Gnutella, Kazaa

� BitTorrent: trackers, peers and seeds

Page 39: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

39

P2P Middleware� P2P middleware facilitates P2P application

development by hiding overlay and service discovery issues

� Applications still need to handle multiple concurrent requests from local user and multiple peers� Multithreading: thread pool consuming requests or

new-thread-per-request� Asynchronous I/O: single thread juggles requests

� JXTA – Java P2P platform

� Windows P2P Networking

� P2P.NET – UoM FYP

Page 40: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

40

P2P File System and Backup� Provide file system interface access to

files whose blocks are distributed across several peers

� Encrypt blocks using their own hash to protect data from 3rd parties

� P2P discovery for blocks using hash as search key�Enables shared blocks between multiple

users without compromising security

� Block/file replication for resilience

� UoM FYP

Page 41: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

41

Other Applications� P2P Directories

� Publish-subscribe� Data feed streaming (e.g. stock data)

� Hierarchical video streaming

� P2P Collaborative Groupware � Instant messaging

� Telephony/video-conferencing

� Shared calendars/project tools

� Anonymous web surfing

� P2P Parallel computing� Grid computing

� SETI@Home (farming - is it P2P?)

Page 42: Introduction to Peer-to-Peer Computingstaff.um.edu.mt/csta1/P2P.pdf · underlying network facilities (e.g. TCP connections or UDP messages) Overlay network distance is measured in

42

Information Sources

� Milojicic et al. Peer-to-Peer Computing. HP Labs, 2002

� D Verma. Legitimate P2P Network Applications. Wiley, 2004