a multi-agent algorithm to improve content management in...
TRANSCRIPT
A multi-agent algorithm to improve content management in CDN networks
Agostino Forestiero, [email protected]
Carlo Mastroianni, [email protected]
ICAR-CNR
Institute for High Performance Computing and Networks
Cosenza, Italy
IDCS 2014, September 22 - 24, 2014
Application Domain and Objectives
P2P content delivery networks
A content delivery network (CDN) is a large distributed system of servers deployed in multiple
data centers across the Internet. The goal of a CDN is to serve content to end-users with high
availability and high performance.
While most early CDNs served content using dedicated servers owned and operated by the CDN,
there is a recent trend [e.g., Akamai] to use a hybrid model that uses P2P technology.
Hybrid architecture of CDN and P2P is a promising network technology enabling effective real-
time streaming services. It complements the advantages of quality control and reliability in
CDN and the scalability in the P2P system.
When the network size increases, they show limits and weaknesses
Decentralized algorithms and protocols can be usefully employed to improve their efficiency
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
We propose a self-organizing, decentralized and adaptive approach to improve content management in P2P CDNs.
A multi-agent algorithm biologically inspired for contents
management
Contents are described through metadata documents/descriptors
Metadata descriptors are indexed through binary keys which can
represent the presence or the absence of some topics, e.g. in the case that resources are text documents,
or
be the result of the application of a locality preserving hash function, that maps similar contents into
similar keys
Ant-like mobile agents travel the network through P2P interconnections
Agents replicate/pick/drop metadata descriptors in order to disseminate useful information
They also spatially sort metadata descriptors by placing similar descriptors in neighbour hosts
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
Application Domain and Objectives
Agents use probability functions to replicate and move metadata descriptors
These functions are based on the definition of a similarity measure among descriptors
similarity can be calculated on metadata descriptors because similar descriptors
correspond to similar contents
agents tend to pick a descriptor from a host when this descriptor is considered “different” from the others located in the same region
agents tend to drop a descriptors in a region that maintains similar descriptors
Agent operations
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
It is calculated by an agent each time it tries to pick or drop a metadata descriptor m
This function measures the similarity of a metadata binary string m with all the other metadata
located in the local region R1.
Nm is the overall number of descriptor in R
Ham(m, m) is the Hamming distance2 between the metadata descriptor m under
examination and the metadata descriptor m
sim(m, R) assumes values comprised between 0 and 1
Similarity function
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
1) The local region R for each host s includes s and all the hosts reachable from s in a number of hops h.
2) The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different.
When an unloaded agent gets to a new host, it evaluates the P1 function for each descriptor of this
host
The agent extracts a random number between 0 and 1. If this number is lower than P1, the pick
operation is performed
The probability of picking a metadata document from a server must be inversely proportional to
the similarity function sim.
The parameter K1 can be tuned to modulate the degree of similarity
In this analysis, K1 is set to 0.1
P1 probability function
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
When a loaded agent gets to a new host, it evaluates the P2 function for each carried metadata descriptor
The agent extracts a random number between 0 and 1. If this number is lower than P2, the drop operation is actually performed
P2 is directly proportional to the average similarity sim
The parameter K2 is set to 0.5
K2 is higher than K1.
This limits the drop probability and allows agents to carry the descriptors for a sufficient number of hops, in order to deposit them into appropriate hosts
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
P2 probability function
An agent can operate in 2 modes: copy and move
Under the copy mode, the agent replicates a metadata descriptor before picking it: one copy
is left on the host, the other is carried by the agent
Under the move mode, the agent just picks the metadata descriptor, without generating any
replica
Agents in copy are able to replicate and disseminate the information
Agents in move are specialized in the relocation of information
The copy mode cannot be maintained for a long time, since eventually every host would
store a very large number of metadata of all types, thus weakening the efficacy of
spatial reorganization.
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
Operation mode of agents
Each agent autonomously switches
from copy to move
Each agents switches from the copy to the move mode according to a self-organization mechanism inspired by ants and other insects
The agent maintains a pheromone level (real value) which increases as its activeness (in terms of pick and drop operations) decreases
As the pheromone level exceeds a threshold Th, the agent switches to move
Indeed, low activeness means that descriptors have already been reorganized
Therefore the generation of more replicas would be damaging
The pheromone level at the end of the i-th time interval is:
Ev is the evaporation rate and is set to 0.9
The threshold Th is set to 9.0
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
Mode switch of agents
• As a network region
accumulates metadata
descriptors having similar
keys, it becomes more and
more likely that:
• ”outlier” metadata
descriptors will be picked by
agents
• other similar metadata
descriptors will be dropped
by other agents in this region
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
High-level description of the algorithm performed by mobile agents
Cyclically, the agents perform a given number of hops among servers and, when they get to a server, they decide which probability function they must use, based on their state. If the agent does not carry metadata it computes P1, otherwise it computes P2.
The effectiveness of the algorithm has been evaluated by defining the
spatial uniformity function, i.e. the average homogeneity of metadata
documents stored in neighbor hosts.
The overall uniformity function Us
for each host we average the Hamming distance between all the couples of
descriptors within the visibility region
then, we average the result for all the hosts
The objective is to increase the Uniformity function as much as possible
this would mean that similar metadata descriptors have been aggregated
in neighbor hosts, and therefore an effective sorting of metadata descriptors
has been achieved
Uniformity function
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
An event-based simulator, written in Java, was implemented to evaluate the
performance of the algorithm
The P2P scenario is characterized as follows:
number of bits of metadata descriptors, dim= 3,4,5,6
number of hosts Ns = 500, 1000, 2000, 4000, 8000
but results are independent from network size and number of bits, meaning
that the algorithm is scalable
average connection degree = 4 (use of power law networks)
probability of agent generation Pgen = 0.5
average number of agents Na = Np * Pgen
average number of resources per peer = 15 (Gamma distribution)
average interval between two agent movements Tmov = 60 s
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
Simulation scenario
2,500 hosts are arranged in a grid topology and each metadata is associated to a RGB
color with 3 bit descriptors
Each host is visualized by means of the RGB color of the metadata with the highest
number of elements placed in it
e.g., a red color corresponds to a large fraction of (1,0,0) descriptors in that host
T = 0 T = 5,000 T = 10,000 T = 20,000 T = 40,000
As the process goes on, metadata descriptors
are reorganized and sorted, which is proved
by the presence of clearly distinguishable
(and gradually changing) color spots
Time
Graphical description of the sorting process
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
Uniformity function
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
Uniformity of the whole network when the number of bits of the binary string representing the content ranges from 3 to 6.
The logical reorganization is obtained independently of the number of bits.
Uniformity function
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
Uniformity, vs. time, for different values of the number of servers.
The size of the network has no detectable effect on the overall uniformity index
Metadata documents handled by a server
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
Mean number of metadata documents handled by a server when the length of binary strings ranges from 3 to 6 bits.
The number of metadata documents maintained by a server increases from an initial value of about 15 to much higher values;
The trend of this value undergoes a transient phase, then it becomes stabilized, even if with some fluctuations.
The sorting of metadata descriptors can be exploited by an
informed resource/content discovery protocol
1. users issue queries for resources/contents having specified metadata descriptors
2. a query is forwarded towards the neighbor host whose metadata descriptors are the
closest to the target metadata descriptor
the next host in the path is selected through the similarity function
3. the search stops when no better neighbor can be selected, and a queryHit
message will return to the requesting host
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
Discovery of metadata descriptors
Range queries are queries in which some bits of the target binary string are wildcard bits, while other bits are specified (overlapping bits)
The logical reorganization of the metadata documents improves the rapidity and effectiveness of discovery operations, and enables the execution of range queries
Mean number of results
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24
Mean number of results collected by a range query when the length of the binary string representing the content is set to 4 and the number of overlapping bits ranges from 1 to 4.
The number of results decreases with the number of overlapping bits
Range queries provide an efficient way to discover (in just one shot) much many results than a single query.
A nature-inspired algorithm to build an P2P information system for
Content Delivery Networks, was presented.
Ant-inspired mobile agents use probability functions to replicate
and move metadata descriptors so as to cluster similar metadata in
neighbor hosts.
The logical reorganization of the metadata documents improves the
rapidity and effectiveness of discovery operations, and enables
the execution of range queries, i.e., requests of content that matches
some specified features.
Performance analysis, achieved through event-based simulation,
confirms the effectiveness of the approach and the increased
efficiency of discovery operations, specifically of range queries.
Conclusions
A. Forestiero and C. Mastroianni, A multi-agent algorithm to improve content management in CDN networks, IDCS ‘14, September 22 – 24 2014
Thank you for your attention !!!!