introduction to evolutionary computation genetic algorithms are inspired by the biological...
Post on 20-Dec-2015
221 views
TRANSCRIPT
Introduction to Evolutionary Computation Genetic algorithms are inspired by the biological
processes of reproduction and natural selection. Natural selection determines which members of a population survive to reproduce, and reproduction ensures that the species will continue.
Introduction to Evolutionary Computation(Applications)
Aircraft Design, Routing in Communications Networks, Game Playing (Checkers [Fogel]) Robotics, Air Traffic Control, Design, Scheduling, Machine Learning, Pattern Recognition,
Introduction to Evolutionary Computation(Applications cont.)
Job Shop Scheduling, VLSI Circuit Layout, Market Forecasting, Design of Filters and Barriers, Data-Mining, User-Mining, Resource Allocation, Path Planning, Etc.
Evolutionary Databases Optimization
Evolutionary splitting in R-Trees for Spatial Databases
Evolutionary Dynamic Clustering for Spatial Databases
Evolutionary Optimization in Distributed Databases
Structure of R-Trees
The Minimum Bounding Rectangle (MBR) of the object is stored in R-Tree
The leafs: (PO,MBR(PO))
The internal nodes: (PC, MBR(PC))
Parameters: M – the maximum number of entries m – the minimum number of entries
within a node
Evolutionary node splitting in R-Trees
Generalization of the node splitting Minimization of the area and the overlapping of MBRs
(Minimum Bounding Rectangle)
Evolutionary splitting in R-Trees Algorithm (ESRA)
The M+1 objects that has to be distributed to MBRs will be denoted by 1, 2, .. M, M+1.
A maximum of [M/m] MBRs A potential solution of the problem (a chromosome) is
a string of constant length { , , ..., }, where the gene indicates to which MBR the object i belongs.
Fitness function:
Generaliation:
1x 2x 1+Mx]}/,..[2,1{ mMxi ∈
RXf →: )()( xOxTf +=
]1,0[,),()( ∈+= βαβα xOxTGf
169
282
356
222213
265249
305 326
236254
235
0
50
100
150
200
250
300
350
400
450
50 100 150 200 250 300
Maximum size of the rectangles
Average fitness
ESRA
R-tree
The fitness function is to be minimized; the fitness value obtained with the R-Tree algorithm is well - determined, while the fitness value for ESRA is considered to be the average value obtained after 100 executions of it; the minimum usage of an R-Tree node is m=16;
The proposed algorithm is optimizing the clusters’ centers and is determining their number
The population contains the clusters prototypes The quality of a chromosome is computed by
taking into account the distances between prototypes and all the other objects that need to be grouped
The very close individuals are merged, and this way the population size decreases
The final population contains only the clusters’ centers
Evolutionary Dynamic Clustering for Spatial Databases
P – prototype
n – number of total objects
Xi – the i object
∑= +
=n
i i CPxdPf
1
,),(
1)(
α
Evolutionary Dynamic Clustering for Spatial Databases
Existing clusters
Quadratic Cost R-Tree Algorithm
R*-Tree Algorithm
ESRA EDCA
3 63156 63156 22672 (3 detected clusters)
22672 (3 detected clusters)
3 58118 58118 23919 (3 detected clusters)
22427 (4 detected clusters)
4 89736 89736 45151 (4 detected clusters)
45151 (4 detected clusters)
4 87943 87943 31459 (4 detected clusters)
28166 (5 detected clusters)
5 112485 112485 43111 (5 detected clusters)
43111 (5 detected clusters)
5 61846 61846 29684 (5 detected clusters)
29684 (5 detected clusters)
6 113598 127792 59896 (5 detected clusters)
42425 (6 detected clusters)
6 105338 105338 65716 (5 detected clusters)
48245 (6 detected clusters)
The quality of the solutions obtained after using the Quadratic Cost Algorithm, R*-Tree, ESRA and EDCA for 8 sets of grouped spatial objects
Evolutionary Optimization in Distributed Databases
Design relies on the graph representation and system management improvement by use of intelligent agents
initial estimated cost that can be assigned by the system designer to an edge; this cost is estimated based on network transfer rate, data access time and computing power on a site
up-to-date computed cost by agents performing statistic on queries frequency
a potential solution for the problem (a chromosome) is a string of constant length { , ,…, },where the value of the gene indicates to which node of the graph the group of tuples i belongs (n – number of nodes, s – the number of tuples’ groups
1x 2x},...,2,1{, nxx ii ∈
sx
,: RXF → ∑∑==
=s
jiNjij
n
ij
cnofxF11
11)(
Evolutionary Fragmentation and Allocation in DDBS
Experimental results
The given graph and the new costs associated with the edges between nodes are given
Experimental results - contd
TABLE I INITIAL DATASET FRAGMENTATION AND DISTRIBUTION
Node Number of tuples Tuples
1n 170.000
000.1701 tt −
2n 210.000 000.380001.170 tt −
3n 50.000 000.430001.380 tt −
4n 160.000 000.590001.430 tt −
5n 210.000 000.800001.590 tt −
TABLE 2 DATASET RE-FRAGMENTATION AND RE-DISTRIBUTION
Node Number of tuples Tuples
1n 180 .000
000.1001 tt − ,
000.330001.260 tt −
000.490001.480 tt −
2n 110.000 000.110001.100 tt −
,
000.230001.170 tt −
000.340001.330 tt −,
000.630001.600 tt −
3n 270.000 000.140001.110 tt −
,
000.250001.230 tt −
000.470001.380 tt −,
000.520001.490 tt −
000.730001.630 tt −
4n 160.000 000.170001.140 tt −
,
000.480001.470 tt −
000.590001.520 tt −,
000.790001.740 tt −
5n 80.000 000.260001.250 tt −
,
000.380001.340 tt −
000.600001.590 tt −,
000.740001.730 tt −
000.800001.790 tt −
Experimental results - contd
We compare the total cost of the requests with the initial fragmentation and distribution with the total cost of the requests after reallocation of tuples in graph.
For the given example, the total cost of the requests in the graph with the initial distribution is 38.5 mil., while the total cost of the requests after reallocation made by the proposed algorithm is 22.2 mil.
(47.5% improvement)
Conclusions
Evolutionary technique for splitting R-Trees nodes for Spatial Databases
Evolutionary Dynamic Clustering for Spatial Databases
Evolutionary re-fragmentation and re-allocation within a distributed system
Adaptive Goal Guided Recombination (AGGX)
Two new features:
information sharing mechanism between the individuals of a population (each individual knows the value of the best individual obtained so far - the so called global best)
each individual has memory (each individual knows the value of its best related individual - the so called local best)
the control of the amount of relevant genetic information transferred from the global best and local best to the offspring
NoKept - the number of genes kept in the offspring NoEdges - the total number of edges involved in the common
sequences of the global and the local best NoCrt - the number of the current generation NoGen - the number of total generations of the algorithm
a randomly chosen sequence of one parent is always kept in the offspring (in order to increase the population diversity)
NoGen
NoCrtNoGen
eNoEdgesNoKept−
−= *
Adaptive Goal Guided Recombination (AGGX)
Evolutionary Approach of TSP
k cities
potential solution for the problem - a string of length k that contains a permutation of the set {1,…k}
Fitness function f to be minimized:
∑=
++ ≡+=→
k
iii kccdfRSf
1)1()( )11(),,()(,: πππ
Experimental Results
Standard Genetic Algorithm (SGA) for:
TSP instance with 130 cities TSP instance with 76 cities
Conclusions and future work
AGGX is based on two new features: social behavior of individuals within a population the memory of each individual
AND control of the amount of relevant genetic information
transferred from the global best and local best to the offspring
a randomly chosen sequence of one parent is always kept in the offspring
a way to introduce diversity in the population will be pursued