a multi-agent algorithm to improve content management in...

A multi-agent algorithm to improve content management in CDN networks

Agostino Forestiero, [email protected]

Carlo Mastroianni, [email protected]

ICAR-CNR

Institute for High Performance Computing and Networks

Cosenza, Italy

IDCS 2014, September 22 - 24, 2014

Application Domain and Objectives

P2P content delivery networks

A content delivery network (CDN) is a large distributed system of servers deployed in multiple

data centers across the Internet. The goal of a CDN is to serve content to end-users with high

availability and high performance.

While most early CDNs served content using dedicated servers owned and operated by the CDN,

there is a recent trend [e.g., Akamai] to use a hybrid model that uses P2P technology.

Hybrid architecture of CDN and P2P is a promising network technology enabling effective real-

time streaming services. It complements the advantages of quality control and reliability in

CDN and the scalability in the P2P system.

When the network size increases, they show limits and weaknesses

Decentralized algorithms and protocols can be usefully employed to improve their efficiency

A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24

We propose a self-organizing, decentralized and adaptive approach to improve content management in P2P CDNs.

A multi-agent algorithm biologically inspired for contents

management

Contents are described through metadata documents/descriptors

Metadata descriptors are indexed through binary keys which can

represent the presence or the absence of some topics, e.g. in the case that resources are text documents,

or

be the result of the application of a locality preserving hash function, that maps similar contents into

similar keys

Ant-like mobile agents travel the network through P2P interconnections

Agents replicate/pick/drop metadata descriptors in order to disseminate useful information

They also spatially sort metadata descriptors by placing similar descriptors in neighbour hosts


Application Domain and Objectives

Agents use probability functions to replicate and move metadata descriptors

These functions are based on the definition of a similarity measure among descriptors

similarity can be calculated on metadata descriptors because similar descriptors

correspond to similar contents

agents tend to pick a descriptor from a host when this descriptor is considered “different” from the others located in the same region

agents tend to drop a descriptors in a region that maintains similar descriptors

Agent operations


It is calculated by an agent each time it tries to pick or drop a metadata descriptor m

This function measures the similarity of a metadata binary string m with all the other metadata

located in the local region R1.

Nm is the overall number of descriptor in R

Ham(m, m) is the Hamming distance2 between the metadata descriptor m under

examination and the metadata descriptor m

sim(m, R) assumes values comprised between 0 and 1

Similarity function


1) The local region R for each host s includes s and all the hosts reachable from s in a number of hops h.

2) The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different.

When an unloaded agent gets to a new host, it evaluates the P1 function for each descriptor of this

host

The agent extracts a random number between 0 and 1. If this number is lower than P1, the pick

operation is performed

The probability of picking a metadata document from a server must be inversely proportional to

the similarity function sim.

The parameter K1 can be tuned to modulate the degree of similarity

In this analysis, K1 is set to 0.1

P1 probability function


When a loaded agent gets to a new host, it evaluates the P2 function for each carried metadata descriptor

The agent extracts a random number between 0 and 1. If this number is lower than P2, the drop operation is actually performed

P2 is directly proportional to the average similarity sim

The parameter K2 is set to 0.5

K2 is higher than K1.

This limits the drop probability and allows agents to carry the descriptors for a sufficient number of hops, in order to deposit them into appropriate hosts


P2 probability function

An agent can operate in 2 modes: copy and move

Under the copy mode, the agent replicates a metadata descriptor before picking it: one copy

is left on the host, the other is carried by the agent

Under the move mode, the agent just picks the metadata descriptor, without generating any

replica

Agents in copy are able to replicate and disseminate the information

Agents in move are specialized in the relocation of information

The copy mode cannot be maintained for a long time, since eventually every host would

store a very large number of metadata of all types, thus weakening the efficacy of

spatial reorganization.


Operation mode of agents

Each agent autonomously switches

from copy to move

Each agents switches from the copy to the move mode according to a self-organization mechanism inspired by ants and other insects

The agent maintains a pheromone level (real value) which increases as its activeness (in terms of pick and drop operations) decreases

As the pheromone level exceeds a threshold Th, the agent switches to move

Indeed, low activeness means that descriptors have already been reorganized

Therefore the generation of more replicas would be damaging

The pheromone level at the end of the i-th time interval is:

Ev is the evaporation rate and is set to 0.9

The threshold Th is set to 9.0


Mode switch of agents

• As a network region

accumulates metadata

descriptors having similar

keys, it becomes more and

more likely that:

• ”outlier” metadata

descriptors will be picked by

agents

• other similar metadata

descriptors will be dropped

by other agents in this region


High-level description of the algorithm performed by mobile agents

Cyclically, the agents perform a given number of hops among servers and, when they get to a server, they decide which probability function they must use, based on their state. If the agent does not carry metadata it computes P1, otherwise it computes P2.

The effectiveness of the algorithm has been evaluated by defining the

spatial uniformity function, i.e. the average homogeneity of metadata

documents stored in neighbor hosts.

The overall uniformity function Us

for each host we average the Hamming distance between all the couples of

descriptors within the visibility region

then, we average the result for all the hosts

The objective is to increase the Uniformity function as much as possible

this would mean that similar metadata descriptors have been aggregated

in neighbor hosts, and therefore an effective sorting of metadata descriptors

has been achieved

Uniformity function


An event-based simulator, written in Java, was implemented to evaluate the

performance of the algorithm

The P2P scenario is characterized as follows:

number of bits of metadata descriptors, dim= 3,4,5,6

number of hosts Ns = 500, 1000, 2000, 4000, 8000

but results are independent from network size and number of bits, meaning

that the algorithm is scalable

average connection degree = 4 (use of power law networks)

probability of agent generation Pgen = 0.5

average number of agents Na = Np * Pgen

average number of resources per peer = 15 (Gamma distribution)

average interval between two agent movements Tmov = 60 s


Simulation scenario

2,500 hosts are arranged in a grid topology and each metadata is associated to a RGB

color with 3 bit descriptors

Each host is visualized by means of the RGB color of the metadata with the highest

number of elements placed in it

e.g., a red color corresponds to a large fraction of (1,0,0) descriptors in that host

T = 0 T = 5,000 T = 10,000 T = 20,000 T = 40,000

As the process goes on, metadata descriptors

are reorganized and sorted, which is proved

by the presence of clearly distinguishable

(and gradually changing) color spots

Time

Graphical description of the sorting process


Uniformity function


Uniformity of the whole network when the number of bits of the binary string representing the content ranges from 3 to 6.

The logical reorganization is obtained independently of the number of bits.

Uniformity function


Uniformity, vs. time, for different values of the number of servers.

The size of the network has no detectable effect on the overall uniformity index

Metadata documents handled by a server


Mean number of metadata documents handled by a server when the length of binary strings ranges from 3 to 6 bits.

The number of metadata documents maintained by a server increases from an initial value of about 15 to much higher values;

The trend of this value undergoes a transient phase, then it becomes stabilized, even if with some fluctuations.

The sorting of metadata descriptors can be exploited by an

informed resource/content discovery protocol

1. users issue queries for resources/contents having specified metadata descriptors

2. a query is forwarded towards the neighbor host whose metadata descriptors are the

closest to the target metadata descriptor

the next host in the path is selected through the similarity function

3. the search stops when no better neighbor can be selected, and a queryHit

message will return to the requesting host


Discovery of metadata descriptors

Range queries are queries in which some bits of the target binary string are wildcard bits, while other bits are specified (overlapping bits)

The logical reorganization of the metadata documents improves the rapidity and effectiveness of discovery operations, and enables the execution of range queries

Mean number of results


Mean number of results collected by a range query when the length of the binary string representing the content is set to 4 and the number of overlapping bits ranges from 1 to 4.

The number of results decreases with the number of overlapping bits

Range queries provide an efficient way to discover (in just one shot) much many results than a single query.

A nature-inspired algorithm to build an P2P information system for

Content Delivery Networks, was presented.

Ant-inspired mobile agents use probability functions to replicate

and move metadata descriptors so as to cluster similar metadata in

neighbor hosts.

The logical reorganization of the metadata documents improves the

rapidity and effectiveness of discovery operations, and enables

the execution of range queries, i.e., requests of content that matches

some specified features.

Performance analysis, achieved through event-based simulation,

confirms the effectiveness of the approach and the increased

efficiency of discovery operations, specifically of range queries.

Conclusions

A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24 2014

Thank you for your attention !!!!

a multi-agent algorithm to improve content management in...

Documents