hnust, 2009 1 gregor v. bochmann, university of ottawa gregor v. bochmann school of information...

Gregor v. Bochmann, University of Ottawa HNUST, 2009 1

Gregor v. Bochmann

School of Information Technology and Engineering (SITE)University of Ottawa

Canadahttp://www.site.uottawa.ca/~bochmann/talks/p2p.ppt

Data Management in Peer-to-Peer Systems

HNUST, Xiangtan, ChinaNovember 2009


AbstractA peer-to-peer (P2P) system is a large collection of

computers (nodes), interconnected through the Internet, without centralized control; the system is controlled through a peer-to-peer protocol where each node only communicates with a few neighbors. A P2P system is characterized by its churn (frequency of nodes leaving or joining the system). In this talk, we will discuss the issues related to storing and retrieving information in P2P systems. This includes the issues of reliability, distributed search, and load balancing. We consider hash-based retrieval, as well as searching in linear, multi-dimensional (e.g. geographic) and general metric spaces. Various applications, including file sharing and video distribution will be discussed.


Overview1. What are peer-to-peer systems ?2. Typical applications3. Searching indexed data objects4. Distributed hash tables (DHT)5. Multi-dimensional search spaces6. Clustered P2P systems7. Geographic considerations8. An application: Video streaming9. Load balancing


1. What are P2P systems ? “Overlay network”

3. Peer-to-peer application2. Peer-to-peer infrastructure1. Internet (IP/TCP)

Churn (nodes come and go) No central control (in principle)

Hybride decentralized architectures Some functions are centralized, others p2p

Partially decentralized architectures Some functions are performed by “super-peers”

Purely decentralized architectures No central “ownership”, resistance to

censorship See also ACM Survey [1]


Typical P2P applications File sharing (this led to bad reputation for P2P

systems because of copyright infringements) Distributed computing (e.g. SETI, genome) Audio and video multicasting Scalable data storage (issue of data

structuring and searching) Scalable implementation support for

various applications Skype social networking


Classification of P2P systems (data storage)

Structured Each object (data item or service) is

placed on a specific node, based on some key that characterizes the object

Efficient searching algorithms are based on this structure

Unstructured There is no rule about where objects

should be stored Searching uses flooding or some more

efficient heuristics


Classification: example systems


Classification (ii) : Nature of a “node”

Host computer, also called Servent (server + client)

Virtual server, residing with others on a host computer

Cluster of several computers (may be geographically dispersed)


Issues Problems

Churn (nodes may leave without notice) Malicious nodes

Scalability (very large number of nodes, users)

How to assure service quality ? response time, availability, reliability (no data loss and

correctness)

Use of authentication, reputation data ? (centralized or distributed)

Additional services: access right verification, anonymity, incentive mechanisms [4] (problem of “free riders”)


3. Searching index data objects

The index key is composed of one or several attributes of data objects

e.g. family name, first name search for all people whose family name starts with “Boch”

(this is a range query – keys from “Boch” to “Boci”) or look for “Bochmann, Gregor” and get one or more entries

Most p2p systems assume that the key value is represented by a fixed-length bit sequence, and b bits are combined into a digit (e.g. b = 4 and the values of a digit range from 0 to 15)

Each node is identified by a key value (often selected randomly)

A data object with key ko is stored on a node that has an identifier kn close to ko (different definitions of “closeness”)


Plaxton’s search algorithm Each node has a routing table [8]

One level (column or row) for each digit of the key space One row (or column) for each possible value of a digit Each entry points to (one or several) “neighbor” nodes

Choice: start from - least or most -significant digit ?


Plaxton: Detailed explanation (skip)

Each node has a routing table with one row for each digit in the key range. Row i for a node with key kn includes for each possible value of the i-th digit a routing table entry. This entry is in the simplest case pointer (address) of a node whose key knext is identical to kn up to the (n-1)-th digit and whose i-th digit is identical to the value corresponding to the routing table column. (Example)

When a search request for key ko arrives at a node with key kn, first ko is compared with kn and if the first j digits are identical, then the request is forwarded to the node pointed by the routing table entry in row j and column m, where m is the value of ko’s digit at position (j+1).

Each time the request is forwarded, the difference between ko and kn is reduced. – Two choices: start with most or least significant digits. (According to the ACM survey, Plaxton and Tapestry start with the least significant digits; Pastry starts with most significant)


Routing exampleNode 67493

searches for node 34567

in a Plaxton mesh using keys made up of decimal digits of length 5.


Scalability is good The complexity of Plaxton’s algorithm

is O(log(N)), where N is the size of the search space. – Note: same complexity as internal search, however, units are message exchanges, not memory accesses.

N = (2b)L where L is the number of levels in the routing table

There are at most L messages sent for each search


How to adapt Plaxton to dynamic network with churn ?

Flexibility of selecting different nodes for the entries in the first level.

The Pastry P2P system [7] uses a leaf set that stores pointers to closest nodes in a range around the node’s key.

Pastry search starts with the most significant digit.


Searching using a skip list E.g. CORDS [2] The space of the key values is considered a

ring, and each node has a routing table containing a pointer to his next neighbor in the key space, as well as so-called “skip” pointers to the k-th neighbors where k = 2, 4, 8, 16, etc.

When a search request for key ko arrives at a node with key kn , the difference between these two values is first determined and the request is forwarded following the largest skip pointer that leads to a node with a key lower than ko .


CORDS : Example Nodes with

keys 0, 1 and 3

Objects with keys 1, 2 and 6

Ring pointers in blue

This is very similar to Pastry (for r = 1)

Reliability through maintenance of ring pointers


4. Distributed hash tables (DHT) In many P2P systems, the search algorithm

discussed above does not use the index value as search key, but they hash the index value and use the hash key as search key. The identifiers of the nodes are selected randomly from the hash key space. This is called a “distributed hash table”.1. Search key = Hash(semantically meaningful index value);2. Find node where objects with this search key are stored

(using search alg.) Advantage: The search keys of objects will be

randomly distributed throughout the key space. Therefore relatively uniform search depth - number of times a request is forwarded is on average O(log(N)). Note: The semanically meaningful index values will in general not be uniformly distributed.

Disadvantage: Range queries are difficult to realize.


5. Multi-dimensional search spaces

For searching in geographical coordinates or other multi-dimensional spaces, the searching for a single dimension, as explained above, can be performed in parallel for each of the dimensions. For a search in m dimensions, the search

key would have m components, and at each level of the search, each of these components would be refined. One needs a routing table with m*2 dimensions.

The CAN P2P system [3] uses another method which is less scalable


Geographic searches in CAN

Search: linearly

stepping through neighbors

Insertion of a new node near E


High-dimensional search spaces

Typical applications: text searching, image matching, etc. Example of distance metric for text searching: given a

search string of characters, the distance from a given stored text = number of non-matching characters in the best matching string found in the stored text.

In high-dimensional spaces, the above approach becomes inefficient.

Use search methods for metric spaces instead A distance metric is defined between any two points,

satisfying the triangular inequality. Problem: find all objects that lie within a

given distance from the query point (search key)


Principles of search in metric spaces

If the number of dimensions is very big (in mathematical metric spaces, they may be infinite), the coordinates of a point are less relevant. What counts is the distance from other points.

Idea: classify objects by distance to objects within the space, so-called pivot points. Determining the distance between two points is

often computationally complex; therefore one should minimize the number of distance computations from pivot points.

See also [9]


Search tree with two pivot points per node


Search tree with one pivot points per node


Detailed explanation (skip) The distance of the search key from the keys of the nodes

must be determined and minimized. For these kind of applications, the determination of such a distance is normally relatively complex. Therefore the number of such distance computations should be minimized.

Since one does not have fixed dimensions, the position of a key value in respect to the key values of other objects, or of the nodes, can only be performed by determining the distance with selected key within the space. These points can be arbitrarily chosen and are called pivot points.

One method is to classify a given key in respect to two given pivot point according to which of these two points is closer. Another method uses only one pivot point per classification step and classifies the key according to the distance with this point, by distinguishing several possible ranges for this value.

In order to determine the position of a key within the metric space, several of such classification steps must be done, usually sequentially, corresponding to a search tree.


6. Clustered P2P systems Reliability of data storage: data must

be stored redundantly (replication) CORDS: k copies on nodes following the

object key on the ring. Pastry: k copies on nodes close to the

object key Tapestry : replication function produces

ids randomly distributed in the key space; OceanStore uses erasure coding for redundant storage and Byzantine agreement for updates

Equus: Clustered P2P system [10]


Equus Each cluster of nodes has an identifier, and all nodes of a cluster

contain the same data or data services. Each node of a cluster has pointers to all other nodes of the

cluster. Clusters for a ring (like CORDS) Each node of a cluster has a routing table (Plaxton) where each

entry consist of k pointers to appropriate nodes of other clusters. Each cluster is responsible for all objects with keys between its

own key and the key of its next neighbor cluster (like in CORDS). A node joining the P2P system finds an existing locally near node

and joins the cluster of that node. When a cluster becomes too big or too small (limit determined by

desired relia bility and churn rate), the cluster may split or join a neighboring cluster, respectively.

When a cluster splits, the geographical coherence of the two parties is assured, and the responsible key space is split between the two sub-clusters.

When two neighboring clusters merge, the new cluster will be responsible for the union of the objects.


Equus : some diagrams

ring of clusters

Spacial distribution of clusters in a simulation

over a square geographical space(the ring is a “space filling curve”)


7. Geographic considerations Stretch: Because of the overlay nature of P2P

systems, the actual propagation delay for transmission of some data (e.g. a query) from a source to a destination is often much larger than the direct network propagation delay from the source to destination.

Different underlying spaces: geographic distance, network delay

In the Plaxton routing table, there are many candidate nodes for the entries of the earlier levels. If the node keys are randomly distributed throughout the geographic space, one could select nodes that are close by (e.g. Pastry [6]). For the last levels, there are only few candidates, and far-away nodes may have to be selected. This leads to lower stretch values.


Example of stretch

0010000100

1010110101

1101011010

11110111101111111111

0010000100

1011010110

1110111101

11110111101111111111

randomlyRouting table: entries selected from candidates -

by distance


Stretch for Pastry and Equus

0010000100

1000100000

0000000000

0001000010

0001100011

1100110000

1010101000

1110111000

1111111100

1111111111

0100001000

0111001110

0101001010

0110001100

1001100100

0001100011

1111111111

Pastry

Equus

Equus


Overlay multicasting - convergence For multicasting of data or streams (audio

and/or video) convergence is another important quality measure.

001000001000

100001000000

000000000000

000100000100

000110000110

110001100000

101001010000

111001110000

111101111000

111111111100

010000010000

011100011100

010100010100

011000011000

100101001000

000111000111

111111111111


Simulation results : convergence with Equus routing [11]

dc

d1

d2

convergence =

{dc/(dc+d1)+ dc/(dc+d2)} /2

Note: forward Pastry routing leads to bad convergence


Geographical data storage Object key: geographical location Object storage: on a node close to the key

clustering

Using Equus clustering idea with modifications [13]


8. An application: Video Streaming Use Equus clustered P2P infrastructure (see

CliqueStream [11]) Two-layer architecture

Equus: Nodes that turn-on TV join the eQuus overlay Streaming overlay for each TV channel: Nodes

that watch that channel must get the stream Nodes in the same cluster can interconnect as mesh Bring the stream to at least one node in the cluster Multi-cast of a channel to all receiving clusters is

performed by stable nodes (nodes that are stable and have enough processing power); one stable node per cluster


Example system architecture

sourcesource

receiverreceiver

stable nodestable node(forwarding/receiving)(forwarding/receiving)

clustercluster

receiverreceiver

receiversreceivers

receiverreceiver

receiverreceiver


CliqueStream - Summary Failure recovery: Detailed procedures are given for

recovering from failures of nodes and stable nodes (using secondary stable nodes in each cluster)

Main features Achieves traffic localization localized and fast failure recovery two layer architecture, channel switching is faster than

turning on works best when channel popularity is correlated to locality playback latency is correlated to locality

Limitations and possible extensions Did not consider malicious behavior Did not consider striping/splitting of the stream Need to address implementation issues such as NAT

traversal


9. Load balancing Overview

Reasons for imbalance Load exchange mechanisms Control procedures for load balancing

Load monitoring Decision procedures

Qiao’s load balancing approach Issues


Reasons for imbalanceUneven –

distribution of data items in the index space (Note: this distribution can be equalized through hashing)

size of data items power of nodes, in terms of processing power,

storage capacity or transmission bandwidth rate of user requests (for different data items

or services); this rate may actually change rapidly due to unforeseen circumstances


Load exchange mechanisms

Load transfer – from an overloaded node to a under-loaded node in the key-space vicinity Load transfer means transferring some data and/or

application software from the overloaded node to another node

Virtual node transfer - (in case of virtual nodes) transfer a virtual node from the overloaded host to an under-loaded host

Host transfer – freeing a host from its current tasks and inserting it in the P2P system with a identifier close to the identifier of the overloaded node Find an under-loaded node, transfer its load to the

neighbors, then use it as the free host In the case of clustered systems, transfer a node

from an under-loaded cluster to the overloaded cluster


Control procedures Difficulty: Need for scalable procedures

– no centralized control Monitoring function: information

about current load situation (from the neighbors in the routing table ?)

Decision procedures: Is this node overloaded ? – Is one of the neighbors

overloaded ? Boundary values ? – How should it be selected ? – or is the

decision done by comparing the load of different nodes and possibly with an estimated average value ?

What measure should be taken as “load index” that characterizes the load ? (e.g. utilisation, available capacity, queue length ??)

Can we identify any under-loaded node ? When should a load exchange be initiated ? – timing ? - which

partner ?


Remarks on load balancing Qiao’s load balancing approach

Diffusive monitoring approach [12] Diffusive monitoring approach Different decision procedures (neighborhood,

overload-initiated, underload-initiated) Load index: available capacity Host transfer for clustered P2P systems

Issues Performance parameters

Speed of load balancing Message overhead

Stability of load balancing algorithm


Conclusions The basic ideas of P2P systems have been developed

around 2000 and the following years. - Initial application: file sharing; since then a number of other applications have been developed and deployed.

Strength: scalability, cooperation environment Difficulties: Churn, free-riders, vulnerability to

malicious nodes Different schemes for searching Cluster concept can provide reliability (redundancy) Future trends

New applications Better load balancing, consider reputation – incentive

mechanisms Combination of free P2P resources with dedicated (more

costly) stable nodes (e.g. for video distribution, GRID applications, etc.)


References[1] Stephanos Androutsellis-Theotokis and Diomidis Spinellis. A survey of peer-to-peer content distribution

technologies. ACM Computing Surveys, 36(4):335–371, December 2004.[2] I Stoica, R Morris, D Karger, MF Kaashoek, and H Balakrishnan. Chord: A scalable peer-to-peer lookup service

for internet applications. In Proceedings of SIGCOMM 2001, August 2001.[3] S Ratnasamy, P Francis, M Handley, and R Karp. A scalable content-addressable network. In Proceedings of

SIGCOMM 2001, August 2001.[4] Cohen. Incentives build robustness in bitorrent. In Proceedings of the 1st Workshop on Economics of Peer-to-

Peer Systems, June 2003.[5] J Kubiatowicz, D Bindel, Y Chen, P Eaton, D Geels, SR Gummadi, H Weatherspoon, W Weimer, C Wells, and

B Zhao. Oceanstore: An architecture for global-scale persistent storage. In Proceedings of ACM ASPLOS. ACM, November 2000.

[6] M Castro, P Druschel, Hu YC, and A Rowstron. Exploiting network proximity in peer-to-peer overlay networks. In Proceedings of the International Workshop on Future Directions in Distributed Computing (FuDiCo 2002) , June 2002.

[7] A Rowstron and P Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of IFIP/ACM Middleware, Heidelberg, Germany, November 2001.

[8] CG Plaxton, R Rajaraman, and AH Richa. Accessing nearby copies of replicated objects in a distributed environment. In Proceedings of ACM SPAA. ACM, June 1997.

[9] Michal Batkoa, David Novaka, Fabrizio Falchib, Pavel Zezulaa, Scalability comparison of Peer-to-Peer similarity search structures, Future Generation Computer Systems, Vol. 24 (2008), pp. 834–848.

[10] Locher T., Schmid, S. & Wattenhofer, R. (2006). eQuus: A Provably Robust and Locality-Aware Peer-to-Peer System. Sixth IEEE International Conference on Peer-to-Peer Computing (P2P'06), 2006

[11] Asaduzzaman, S., Qiao, Y. & Bochmann, G. (2008). CliqueStream: An Efficient and Fault-resilient Live Streaming Network on a Clustered Peer-to-peer Overlay. In Proceeding of 8th International Conference on Peer-to-Peer Computing, Aachen, Germany, 2008. See also: CliqueStream: Creating an efficient and resilient transport overlay for peer-to-peer live streaming using a clustered DHT, Journal on Peer-to-Peer Networking and Applications, to be published.

[12] Y. Qiao and G. v. Bochmann, A diffusive load balancing scheme for clustered peer-to-peer systems, Proc. 3rd Intern. Workshop on Peer-to-Peer Networked Virtual Environments (P2PNVE 2009), Shenzhen (China), Dec. 2009.

[13] S. Asaduzzaman and G. v. Bochmann, A locality preserving routing overlay using geographic coordinates, IEEE Intern. Conf. on Internet Multimedia Systems Architecture and Application, Bangalore, India, Dec. 2009.


Thanks !

Any questions ??

For copy of slides, see

http://www.site.uottawa.ca/~bochmann/talks/p2p.ppt

hnust, 2009 1 gregor v. bochmann, university of ottawa gregor v. bochmann school of information...

Documents