genetic algorithms

76
1 / 76 / 76 Genetic Algorithms Genetic Algorithms Authors: Authors: Aleksandra Popovic, [email protected] Aleksandra Popovic, [email protected] Aleksandra Jankovic, [email protected] Aleksandra Jankovic, [email protected] Prof. Dr. Dusan Tosic, [email protected] Prof. Dr. Dusan Tosic, [email protected] Prof. Dr. Veljko Milutinovic, [email protected] Prof. Dr. Veljko Milutinovic, [email protected]

Upload: tamira

Post on 20-Jan-2016

46 views

Category:

Documents


5 download

DESCRIPTION

Genetic Algorithms. Authors: Aleksandra Popovic, [email protected] Aleksandra Jankovic, [email protected] Prof. Dr. Dusan Tosic, [email protected] Prof. Dr. Veljko Milutinovic, [email protected]. Summary Slide. What You Will Learn From This Tutorial?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Genetic Algorithms

11 / 76 / 76

Genetic AlgorithmsGenetic Algorithms

Authors:Authors:

Aleksandra Popovic, [email protected] Popovic, [email protected]

Aleksandra Jankovic, [email protected] Jankovic, [email protected]

Prof. Dr. Dusan Tosic, [email protected]. Dr. Dusan Tosic, [email protected]

Prof. Dr. Veljko Milutinovic, [email protected]. Dr. Veljko Milutinovic, [email protected]

Page 2: Genetic Algorithms

22 / 76 / 76

Summary SlideSummary Slide

What You Will Learn From This Tutorial?What You Will Learn From This Tutorial?

Page 3: Genetic Algorithms

33 / 76 / 76

What You Will Learn From This Tutorial?What You Will Learn From This Tutorial?

What is a genetic algorithm?What is a genetic algorithm? Principles of genetic algorithms.Principles of genetic algorithms. How to design an algorithm?How to design an algorithm? Comparison of gas and conventional algorithms.Comparison of gas and conventional algorithms.

Mathematics behind GA-sMathematics behind GA-s

Applications of GAApplications of GA– GA and the Internet GA and the Internet – Genetic search based on multiple mutation approaches Genetic search based on multiple mutation approaches

Part IIPart II

Part IPart I

Part IIIPart III

Page 4: Genetic Algorithms

Part I: GA TheoryPart I: GA Theory

What are genetic algorithms?What are genetic algorithms?

How to design a genetic algorithm?How to design a genetic algorithm?

Page 5: Genetic Algorithms

55 / 76 / 76

Genetic Algorithm Is Not...Genetic Algorithm Is Not...

...Gene...Gene codingcoding

Page 6: Genetic Algorithms

66 / 76 / 76

Genetic Algorithm Is...Genetic Algorithm Is...

… … Computer algorithm Computer algorithm

That resides on principles of genetics and evolutionThat resides on principles of genetics and evolution

Page 7: Genetic Algorithms

77 / 76 / 76

Instead of Introduction...Instead of Introduction...

Hill climbingHill climbing

locallocal

globalglobal

Page 8: Genetic Algorithms

88 / 76 / 76

Instead of Introduction…(2)Instead of Introduction…(2)

Multi-climbersMulti-climbers

Page 9: Genetic Algorithms

99 / 76 / 76

Instead of Introduction…(3)Instead of Introduction…(3) Genetic algorithmGenetic algorithm

I am not at the top.I am not at the top.My high is better!My high is better!

I am at the I am at the toptop

Height is ...Height is ...

I will continueI will continue

Page 10: Genetic Algorithms

1010 / 76 / 76

Instead of Introduction…(3)Instead of Introduction…(3)

Genetic algorithm - few microseconds afterGenetic algorithm - few microseconds after

Page 11: Genetic Algorithms

1111 / 76 / 76

GA ConceptGA Concept

Genetic algorithm (GA) introduces the principle of evolution Genetic algorithm (GA) introduces the principle of evolution and genetics into search among possible solutions and genetics into search among possible solutions to given problem.to given problem.

The idea is to simulate the process in natural systems. The idea is to simulate the process in natural systems. This is done by the creation within a machine This is done by the creation within a machine

of a population of individuals represented by chromosomes, of a population of individuals represented by chromosomes, in essence a set of character strings,in essence a set of character strings,that are analogous to the DNA,that are analogous to the DNA,that we have in our own chromosomes. that we have in our own chromosomes.

Page 12: Genetic Algorithms

1212 / 76 / 76

Survival of the FittestSurvival of the Fittest

The main principle of evolution used in GA The main principle of evolution used in GA is “survival of the fittest”.is “survival of the fittest”.

The good solution survive, while bad ones die.The good solution survive, while bad ones die.

Page 13: Genetic Algorithms

1313 / 76 / 76

Nature and GA...Nature and GA...

NatureNature Genetic algorithmGenetic algorithm

ChromosomeChromosome StringString

GeneGene CharacterCharacter

LocusLocus String positionString position

GenotypeGenotype PopulationPopulation

PhenotypePhenotype Decoded structureDecoded structure

Page 14: Genetic Algorithms

1414 / 76 / 76

The History of GAThe History of GA

Cellular automata Cellular automata – John Holland, university of Michigan, 1975.John Holland, university of Michigan, 1975.

Until the early 80s, the concept was studied theoretically.Until the early 80s, the concept was studied theoretically. In 80s, the first “real world” GAs were designed.In 80s, the first “real world” GAs were designed.

Page 15: Genetic Algorithms

1515 / 76 / 76

Algorithmic PhasesAlgorithmic Phases

Initialize the populationInitialize the population

Select individuals for the mating poolSelect individuals for the mating pool

Perform crossoverPerform crossover

Insert offspring into the populationInsert offspring into the population

The EndThe End

Perform mutationPerform mutation

yesyes

nono

Stop?Stop?

Page 16: Genetic Algorithms

1616 / 76 / 76

Designing GA...Designing GA...

How to represent genomes?How to represent genomes? How to define the crossover operator?How to define the crossover operator? How to define the mutation operator?How to define the mutation operator? How to define fitness function?How to define fitness function? How to generate next generation?How to generate next generation? How to define stopping criteria?How to define stopping criteria?

Page 17: Genetic Algorithms

1717 / 76 / 76

Representing Genomes...Representing Genomes...

RepresentationRepresentation ExampleExample

stringstring 1 0 1 1 1 0 0 11 0 1 1 1 0 0 1

array of stringsarray of strings http avala yubc net ~apopovichttp avala yubc net ~apopovic

tree - genetic programmingtree - genetic programming>>

bbxorxor

oror

cc

bbaa

Page 18: Genetic Algorithms

1818 / 76 / 76

CrossoverCrossover

Crossover is concept from genetics.Crossover is concept from genetics. Crossover is sexual reproduction.Crossover is sexual reproduction. Crossover combines genetic material from two parents,Crossover combines genetic material from two parents,

in order to produce superior offspring.in order to produce superior offspring. Few types of crossover:Few types of crossover:

– One-pointOne-point– Multiple point.Multiple point.

Page 19: Genetic Algorithms

1919 / 76 / 76

One-point CrossoverOne-point Crossover

Parent #1Parent #1 Parent #2Parent #2

00

11

55

33

55

44

77

66

77

66

22

44

22

33

00

11

Page 20: Genetic Algorithms

2020 / 76 / 76

One-point CrossoverOne-point Crossover

Parent #1Parent #1 Parent #2Parent #2

00

11

55

33

55

44

77

66

77

66

22

44

22

33

00

11

Page 21: Genetic Algorithms

2121 / 76 / 76

MutationMutation

Mutation introduces randomness into the population.Mutation introduces randomness into the population. Mutation is asexual reproduction.Mutation is asexual reproduction. The idea of mutation The idea of mutation

is to reintroduce divergence is to reintroduce divergence into a converging population.into a converging population.

Mutation is performed Mutation is performed on small part of population,on small part of population,in order to avoid entering unstable state.in order to avoid entering unstable state.

Page 22: Genetic Algorithms

2222 / 76 / 76

Mutation...Mutation...

11 11 00 11 00 1100 00

00 11 00 11 00 1100 11

11 00

00 11

ParentParent

ChildChild

Page 23: Genetic Algorithms

2323 / 76 / 76

About Probabilities...About Probabilities...

Average probability for individual to crossoverAverage probability for individual to crossoveris, in most cases, about 80%.is, in most cases, about 80%.

Average probability for individual to mutate Average probability for individual to mutate is about 1-2%.is about 1-2%.

Probability of genetic operators Probability of genetic operators follow the probability in natural systems.follow the probability in natural systems.

The better solutions reproduce more often.The better solutions reproduce more often.

Page 24: Genetic Algorithms

2424 / 76 / 76

Fitness FunctionFitness Function

Fitness function is evaluation function,Fitness function is evaluation function,that determines what solutions are better than others.that determines what solutions are better than others.

Fitness is computed for each individual.Fitness is computed for each individual. Fitness function is application depended.Fitness function is application depended.

Page 25: Genetic Algorithms

2525 / 76 / 76

SelectionSelection

The selection operation copies a single individual, The selection operation copies a single individual, probabilistically selected based on fitness, probabilistically selected based on fitness, into the next generation of the population. into the next generation of the population.

There are few possible ways to implement selection:There are few possible ways to implement selection:

– ““Only the strongest survive”Only the strongest survive”• Choose the individuals with the highest fitness Choose the individuals with the highest fitness

for next generationfor next generation

– ““Some weak solutions survive”Some weak solutions survive”• Assign a probability that a particular individual Assign a probability that a particular individual

will be selected for the next generationwill be selected for the next generation• More diversityMore diversity• Some bad solutions might have good parts!Some bad solutions might have good parts!

Page 26: Genetic Algorithms

2626 / 76 / 76

Selection - Selection - Survival of The StrongestSurvival of The Strongest

0.930.93 0.510.51 0.720.72 0.310.31 0.120.12 0.640.64

Previous generationPrevious generation

Next generationNext generation

0.930.93 0.720.72 0.640.64

Page 27: Genetic Algorithms

2727 / 76 / 76

Selection - Selection - Some Weak Solutions SurviveSome Weak Solutions Survive

0.930.93 0.510.51 0.720.72 0.310.31 0.120.12 0.640.64

Previous generationPrevious generation

Next generationNext generation

0.930.93 0.720.72 0.640.64 0.120.12

0.120.12

Page 28: Genetic Algorithms

2828 / 76 / 76

Mutation and Selection...Mutation and Selection...

Phenotype

D

Phenotype

D

Phenotype

D

SelectionSelection MutationMutation

Solution distributionSolution distribution

Page 29: Genetic Algorithms

2929 / 76 / 76

Stopping CriteriaStopping Criteria

Final problem is to decide Final problem is to decide when to stop execution of algorithm.when to stop execution of algorithm.

There are two possible solutions There are two possible solutions to this problem: to this problem: – First approach:First approach:

• Stop after production Stop after production of definite number of generationsof definite number of generations

– Second approach: Second approach: • Stop when the improvement in average fitness Stop when the improvement in average fitness

over two generations is below a thresholdover two generations is below a threshold

Page 30: Genetic Algorithms

3030 / 76 / 76

GA Vs. Ad-hoc AlgorithmsGA Vs. Ad-hoc Algorithms

Genetic AlgorithmGenetic Algorithm Ad-hoc AlgorithmsAd-hoc Algorithms

SpeedSpeed

Human workHuman work

ApplicabilityApplicability

PerformancePerformance

SlowSlow * * Generally fastGenerally fast

MinimalMinimal Long and exhaustiveLong and exhaustive

GeneralGeneralThere are problems There are problems

that cannot be solved analyticallythat cannot be solved analytically

ExcellentExcellent DependsDepends

* Not necessary!* Not necessary!

Page 31: Genetic Algorithms

3131 / 76 / 76

Problems With GasProblems With Gas

SometimesSometimes GA is extremely slow, GA is extremely slow, and much slower than usual algorithmsand much slower than usual algorithms

Page 32: Genetic Algorithms

3232 / 76 / 76

Advantages of GasAdvantages of Gas

Concept is easy to understand.Concept is easy to understand. Minimum human involvement.Minimum human involvement. Computer is not learned how to use existing solution,Computer is not learned how to use existing solution,

but to find new solution!but to find new solution! Modular, separate from applicationModular, separate from application Supports multi-objective optimizationSupports multi-objective optimization Always an answer; answer gets better with time !!!Always an answer; answer gets better with time !!! Inherently parallel; easily distributedInherently parallel; easily distributed Many ways to speed up and improve a GA-based application as Many ways to speed up and improve a GA-based application as

knowledge about problem domain is gainedknowledge about problem domain is gained Easy to exploit previous or alternate solutionsEasy to exploit previous or alternate solutions

Page 33: Genetic Algorithms

3333 / 76 / 76

GA: An Example -GA: An Example - Diophantine EquationsDiophantine Equations

Diophantine equation (n=4):Diophantine equation (n=4):

A*x + b*y + c*z + d*q = sA*x + b*y + c*z + d*q = s

For given a, b, c, d, and s - find x, y, z, qFor given a, b, c, d, and s - find x, y, z, q

Genome:Genome:

(X, y, z, p) = (X, y, z, p) =

xx yy zz qq

Page 34: Genetic Algorithms

3434 / 76 / 76

GA:An Example -GA:An Example - Diophantine Equations(2)Diophantine Equations(2)

CrossoverCrossover

MutationMutation

( 1, 2, 3, 4 )( 1, 2, 3, 4 )

( 5, 6, 7, 8 )( 5, 6, 7, 8 )

( 1, 6, 3, 4 )( 1, 6, 3, 4 )

( 5, 2, 7, 8 )( 5, 2, 7, 8 )

( 1, 2, 3, 4 )( 1, 2, 3, 4 ) ( 1, 2, 3, 9 )( 1, 2, 3, 9 )

Page 35: Genetic Algorithms

3535 / 76 / 76

GA:An Example -GA:An Example - Diophantine Equations(3)Diophantine Equations(3)

First generation is randomly generated of numbers First generation is randomly generated of numbers lower than sum (s).lower than sum (s).

Fitness is defined as absolute value of difference Fitness is defined as absolute value of difference between total and given sum:between total and given sum:

Fitness = abs ( total - sum ) ,Fitness = abs ( total - sum ) ,

Algorithm enters a loop in which operators are performed Algorithm enters a loop in which operators are performed on genomes: crossover, mutation, selection.on genomes: crossover, mutation, selection.

After number of generation a solution is reached.After number of generation a solution is reached.

Page 36: Genetic Algorithms

Part II: Part II: Mathematics Behind GA-sMathematics Behind GA-s

Two methods for analyzing genetics algorithms:Two methods for analyzing genetics algorithms:

Schema analysesSchema analyses

Mathematical modelingMathematical modeling

Page 37: Genetic Algorithms

3737 / 76 / 76

Schema AnalysesSchema Analyses

Weaknesses:Weaknesses: In determining some characteristics of the populationIn determining some characteristics of the population Schema analyses makes some approximations that weaken itSchema analyses makes some approximations that weaken it

Advantages:Advantages: A simple way to view the standard GAA simple way to view the standard GA They have made possible proofs of some interesting theoremsThey have made possible proofs of some interesting theorems They provide a nice introduction to algorithmic analysesThey provide a nice introduction to algorithmic analyses

Page 38: Genetic Algorithms

3838 / 76 / 76

Schema Analyses…Schema Analyses…

Schema – a template made up of a string of 1s, 0s, and *s, Schema – a template made up of a string of 1s, 0s, and *s,

where * is used as a wild card that can be either 1 or 0 where * is used as a wild card that can be either 1 or 0

For example, For example, H = 1 * * 0 * 0H = 1 * * 0 * 0 is a schema. is a schema. It has eight It has eight instancesinstances (one of which is 101010) (one of which is 101010) OrderOrder, , o ( H )o ( H ) , the number of non-*, or defined, bits (in example 3), the number of non-*, or defined, bits (in example 3) Defining lengthDefining length, , d ( H )d ( H ) , greatest distance between two defined bits , greatest distance between two defined bits

(in example H has a defining length of 3) (in example H has a defining length of 3)

Let S be the set of all strings of length Let S be the set of all strings of length l.l. There is possible schemas on S, but different subsets of SThere is possible schemas on S, but different subsets of S Schema cannot be used to represent every possible population within S, Schema cannot be used to represent every possible population within S,

but forms a representative subset of the set of all subsets of Sbut forms a representative subset of the set of all subsets of S

l3l22

Page 39: Genetic Algorithms

3939 / 76 / 76

Schema analyses…Schema analyses…

The end-of-iterations conditions:The end-of-iterations conditions:

expected number of instances of schema H as we iterate the GAexpected number of instances of schema H as we iterate the GA M(H, t) – the number of instances of H at time tM(H, t) – the number of instances of H at time t f(x) – fitness of chromosome xf(x) – fitness of chromosome x - average fitness at time t :- average fitness at time t :

, n=|S|, n=|S|

- average fitness of instances of H at time t:- average fitness of instances of H at time t:

)(tf

n

xftf Sx

)()(

),(ˆ tHu

),(

)(),(ˆ

tHm

xftHu Hx

Page 40: Genetic Algorithms

4040 / 76 / 76

Schema Analyses…Schema Analyses…

- If we completely ignore the effects of crossover and mutation, If we completely ignore the effects of crossover and mutation,

we get the expected value:we get the expected value:

- Now we consider only the effects of crossover and mutation,Now we consider only the effects of crossover and mutation,

which lower the number of instances of H in the population.which lower the number of instances of H in the population.

Then we will get a good lower bound on E(m(H,t+1))Then we will get a good lower bound on E(m(H,t+1)) - probability that a random crossover bit is between the - probability that a random crossover bit is between the

defining bits of H defining bits of H - probability of crossover occurring- probability of crossover occurring

)(

),(),(ˆ

)(

)(

)(

)())1,((

tf

tHmtHu

tf

xf

xf

xfntHmE Hx

Sx

Hx

)(HSc

cp)

1

)((1)(

l

HdpHS cc

Page 41: Genetic Algorithms

4141 / 76 / 76

Schema Analyses…Schema Analyses…

- probability of an instance of H remaining the same after - probability of an instance of H remaining the same after mutation; it is dependent on the order of H mutation; it is dependent on the order of H

- probability of mutation- probability of mutation

- With the above notation , we have:With the above notation , we have:

Schema TheoremSchema Theorem, provided by John Holland , provided by John Holland

)(HSm

mp)()1()( Ho

mm pHS

))1)((1

)(1(

)(

),(),(ˆ))1,(( )(Ho

mc pl

Hdp

tf

tHmtHutHmE

Page 42: Genetic Algorithms

4242 / 76 / 76

Schema analyses…Schema analyses…

The Schema Theorem only shows how schemas The Schema Theorem only shows how schemas dynamically change, and how short, low-order schemas whose dynamically change, and how short, low-order schemas whose fitness remain above the average mean receive fitness remain above the average mean receive

exponentially growing increases in the number of samples. exponentially growing increases in the number of samples.

It cannot make more direct predictions about the It cannot make more direct predictions about the population composition, distribution of fitness and other statistics population composition, distribution of fitness and other statistics more directly related to the GA itself.more directly related to the GA itself.

Page 43: Genetic Algorithms

4343 / 76 / 76

Mathematical ModelMathematical Model One change is that one of the new individuals One change is that one of the new individuals

will be immediately deleted (thus the loop is iteratedwill be immediately deleted (thus the loop is iterated n n times, not times, not n/2n/2)) S – set of all strings of length lS – set of all strings of length l N – size of S, or N – size of S, or

column vector with rows such that thecolumn vector with rows such that the i- i-th componentth component

is equal to the proportion of the population P as timeis equal to the proportion of the population P as time t t that has chromosome that has chromosome i i

column vector with rows such thatcolumn vector with rows such that i i- th component - th component

is equal to the probability that chromosome i will be selected as a parentis equal to the probability that chromosome i will be selected as a parent

)(tp )(tpi

)(ts

)(tsi

Page 44: Genetic Algorithms

4444 / 76 / 76

Mathematical Model…Mathematical Model…

ExampleExample:: l=2, 3 l=2, 3 individuals in the population, two with individuals in the population, two with chromosome 10 and one with chromosome 11, then chromosome 10 and one with chromosome 11, then

If the fitness is equal to the number of 1s in the string, thenIf the fitness is equal to the number of 1s in the string, then

Ttp )3

1,

3

2,0,0()(

Tts )2

1,

2

1,0,0()(

Page 45: Genetic Algorithms

4545 / 76 / 76

Mathematical Model…Mathematical Model…

diagonal matrix with diagonal matrix with

Relation between and : Relation between and :

Goal: given a column vector , to construct a column-vector-valued Goal: given a column vector , to construct a column-vector-valued function such thatfunction such that

M M represents recombination – composition of crossover and mutation represents recombination – composition of crossover and mutation component wise sum of i and j mod 2component wise sum of i and j mod 2

component wise product of i and jcomponent wise product of i and j

matrix whose i , j th entry is the probability that 0 matrix whose i , j th entry is the probability that 0 result from the recombination of i and jresult from the recombination of i and j

NN)( , jiFF )(, ifF ii

p s

)(

)()(tp

tp

F

Fts

x

)(xM

)1())(( tptsΜ

ji

ji

)( ,jiMM jiM ,

Page 46: Genetic Algorithms

4646 / 76 / 76

Mathematical Model…Mathematical Model…

permutation operator on permutation operator on

Finally, is given by the following expressionFinally, is given by the following expression

With this expression, we can calculate explicitly the expected value of each With this expression, we can calculate explicitly the expected value of each generation from the proceeding generation.generation from the proceeding generation.

I hope you now fully understand the mathematics behind GA.I hope you now fully understand the mathematics behind GA.

j NR

TNjj

TNoj yyyy ),...,()),...,(( )1(01

)(sM

)))((,...,))((()( )(12)(0120ts

Tts

Tl

l tstssM

MM

Page 47: Genetic Algorithms

Part III: Applications of GaAsPart III: Applications of GaAs

GA and the InternetGA and the Internet

Genetic search based on multiple mutation approachesGenetic search based on multiple mutation approaches

Page 48: Genetic Algorithms

4848 / 76 / 76

Some Applications of GasSome Applications of Gas

GAGAInternet searchInternet search

Data miningData mining

Software guided circuit designSoftware guided circuit designControl systems designControl systems design

Stock prize predictionStock prize prediction

Path findingPath finding Mobile robotsMobile robotssearchsearch

OptimizationOptimization

Trend spottingTrend spotting

Page 49: Genetic Algorithms

Genetic Algorithm Genetic Algorithm and the Internet and the Internet

The system designed by EBI Group,The system designed by EBI Group,Faculty for Electrical Engineering,Faculty for Electrical Engineering,

University of BelgradeUniversity of Belgrade

Page 50: Genetic Algorithms

5050 / 76 / 76

Algorithms PhasesAlgorithms Phases

Process set of URLs given by userProcess set of URLs given by user

Select all links from input setSelect all links from input set

Evaluate fitness function for all genomesEvaluate fitness function for all genomes

Perform crossover, mutation, and reproductionPerform crossover, mutation, and reproduction

SatisfactorySatisfactorysolutionsolution

obtained?obtained?

The EndThe End

Page 51: Genetic Algorithms

5151 / 76 / 76

IntroductionIntroduction

GA can be used for intelligent internet search.GA can be used for intelligent internet search. GA is used in cases when search space GA is used in cases when search space

is relatively large.is relatively large. GA is adoptive search.GA is adoptive search. GA is heuristic search method.GA is heuristic search method.

Page 52: Genetic Algorithms

5252 / 76 / 76

System for GA Internet System for GA Internet SearchSearch Designed at faculty for electrical engineering, university of belgradeDesigned at faculty for electrical engineering, university of belgrade

CCOONNTTRROOLL

PPRROOGGRRAAMM

AgentAgent SpiderSpider

Input setInput set

TopicTopic

SpaceSpace

TimeTime

Output setOutput set

Current setCurrent set

Top dataTop data

Net dataNet data

GeneratorGenerator

Page 53: Genetic Algorithms

5353 / 76 / 76

SpiderSpider

Spider is software packages,Spider is software packages, that picks up internet documents that picks up internet documents from user supplied input with depth specified by user. from user supplied input with depth specified by user.

Spider takes one URL, fetches all links, Spider takes one URL, fetches all links, and documents thy contain with predefined depth. and documents thy contain with predefined depth.

The fetched documents are stored on local hard disk with same The fetched documents are stored on local hard disk with same structure as on the original location.structure as on the original location.

Spider’s task is to produce the first generation.Spider’s task is to produce the first generation. Spider is used during crossover and mutation.Spider is used during crossover and mutation.

Page 54: Genetic Algorithms

5454 / 76 / 76

AgentAgent

Agent takes as an input a set of urls, Agent takes as an input a set of urls, and calls spider, for every one of them, with depth 1. and calls spider, for every one of them, with depth 1.

Then, agent performs extraction of keywords Then, agent performs extraction of keywords from each document, and stores it in local hard disk.from each document, and stores it in local hard disk.

Page 55: Genetic Algorithms

5555 / 76 / 76

GeneratorGenerator

Generator generates a set of urls from given keywords, Generator generates a set of urls from given keywords, using some conventional search engine. using some conventional search engine.

It takes as input the desired topic, calls yahoo search engine, It takes as input the desired topic, calls yahoo search engine, and submits a query looking for all documents and submits a query looking for all documents covering the specific topic. covering the specific topic.

Generator stores URL and topic of given web page Generator stores URL and topic of given web page in database called topdata.in database called topdata.

Page 56: Genetic Algorithms

5656 / 76 / 76

TopicTopic

It uses topdata DB in It uses topdata DB in order to insert random urls order to insert random urls from database into current set.from database into current set.

Topic performs mutation.Topic performs mutation.

Page 57: Genetic Algorithms

5757 / 76 / 76

SpaceSpace

Space takes as input the current set Space takes as input the current set from the agent application from the agent application and injects into it those urls and injects into it those urls from the database netdata from the database netdata that appeared with the greatest frequency that appeared with the greatest frequency in the output set of previous searches.in the output set of previous searches.

Page 58: Genetic Algorithms

5858 / 76 / 76

TimeTime

Time takes set of urls from agent Time takes set of urls from agent and inserts ones with greatest frequency into DB netdata.and inserts ones with greatest frequency into DB netdata.

The netdata DB contains of three fields: URL, topic, The netdata DB contains of three fields: URL, topic, and count number. and count number.

The DB is updated in each algorithm iteration.The DB is updated in each algorithm iteration.

Page 59: Genetic Algorithms

5959 / 76 / 76

How Does The System Work?How Does The System Work?

CCOONNTTRROOLL

PPRROOGGRRAAMM

AgentAgent SpiderSpider

Input setInput set

TopicTopic

SpaceSpace

TimeTime

Output setOutput set

Current setCurrent set

Top dataTop data

Net dataNet data

GeneratorGenerator

command flowcommand flow

data flowdata flow

Page 60: Genetic Algorithms

6060 / 76 / 76

GA and the Internet: ConclusionGA and the Internet: Conclusion

GA for internet search, on contrary to other gas,GA for internet search, on contrary to other gas,is much faster and more efficient that conventional solutions,is much faster and more efficient that conventional solutions,such as standard internet search engines.such as standard internet search engines.

INTERNETINTERNET

Page 61: Genetic Algorithms

6161 / 76 / 76

Genetic Search Based on Genetic Search Based on Multiple Mutation ApproachesMultiple Mutation Approaches

Concept and its improvements adapted to specific Concept and its improvements adapted to specific applications in e-business, and concrete software packageapplications in e-business, and concrete software package

Main problems in finding information on the Internet:Main problems in finding information on the Internet: How to find quickly and retrieve efficiently the potentially useful How to find quickly and retrieve efficiently the potentially useful

information considering the fact of the fast growth of the quantity information considering the fact of the fast growth of the quantity and variety of Internet sitesand variety of Internet sites

Huge number of documents , many of which are completely Huge number of documents , many of which are completely unrelated to what the user originally attempted to find, searched unrelated to what the user originally attempted to find, searched with indexing engineswith indexing engines

Documents placed on the top of the result list are often less Documents placed on the top of the result list are often less acceptable then the lower onesacceptable then the lower ones

Indexing process may take days, weeks , or even longer, because Indexing process may take days, weeks , or even longer, because the volume of new information being created dailythe volume of new information being created daily

Page 62: Genetic Algorithms

6262 / 76 / 76

Links Based ApproachLinks Based Approach

The The questionquestion is: is:

How to locate and retrieve the needed information before it gets indexed?How to locate and retrieve the needed information before it gets indexed?

TheThe efficient way efficient way to locate the new not-yet-indexed information:to locate the new not-yet-indexed information: Using links-based approaches genetic searchUsing links-based approaches genetic search

simulated annealingsimulated annealing

Best Best result:result:

indexing - based approachesindexing - based approaches

++ links - based approacheslinks - based approaches

Page 63: Genetic Algorithms

6363 / 76 / 76

Genetic Search AlgorithmGenetic Search Algorithm

GENETIC ALGORITHM OF ZERO ORDER, with no mutationGENETIC ALGORITHM OF ZERO ORDER, with no mutation

Start:Start:Model Web presentation that contains all the needed types of Model Web presentation that contains all the needed types of information (fitness function is evaluated).information (fitness function is evaluated).

It is assumes that it includes URL pointers to other similar Web It is assumes that it includes URL pointers to other similar Web presentations, and these are downloaded.presentations, and these are downloaded.The Web presentations that “survived” the fitness function are The Web presentations that “survived” the fitness function are assumed to include additional URL pointers, and their related Web assumed to include additional URL pointers, and their related Web presentations are downloaded next.presentations are downloaded next.After the end-of-search condition is met, the Web presentations are After the end-of-search condition is met, the Web presentations are ranked according to their fitness value.ranked according to their fitness value.

Page 64: Genetic Algorithms

6464 / 76 / 76

Genetic Search Algorithm…Genetic Search Algorithm…

Type of mutation:Type of mutation:

Topic-oriented database mutationTopic-oriented database mutation

Semantic mutationsSemantic mutations

- based on the principles of spatial locality- based on the principles of spatial locality

- based on the principles of temporal locality- based on the principles of temporal locality

Logical reasoning and semantics consideration is involve in Logical reasoning and semantics consideration is involve in picking out URLs for mutation.picking out URLs for mutation.

Page 65: Genetic Algorithms

6565 / 76 / 76

Innovations Required by Domain AreaInnovations Required by Domain Area

APPLICATION LEVELAPPLICATION LEVEL

LEVEL OF THE GENERAL PROJECT APPROACHLEVEL OF THE GENERAL PROJECT APPROACH

AND PRODUCT ARCHITECTUREAND PRODUCT ARCHITECTURE

ALGORITHMIC LEVELALGORITHMIC LEVEL

IMPLEMENTATION LEVEL IMPLEMENTATION LEVEL

Page 66: Genetic Algorithms

6666 / 76 / 76

Application LevelApplication Level

Statistical analysis and data mining has to be performed,Statistical analysis and data mining has to be performed,

in order to figure out the common and typical patterns of in order to figure out the common and typical patterns of behavior and needbehavior and need

The state-of-the-art of mutual referencing has to be determinedThe state-of-the-art of mutual referencing has to be determined

The trends and asymptotic situations foreseen for the time of The trends and asymptotic situations foreseen for the time of project finalization has to be determinedproject finalization has to be determined

Page 67: Genetic Algorithms

6767 / 76 / 76

Level of the General Project Level of the General Project Approach and Product ArchitectureApproach and Product Architecture

Decisions have to be made about the most important goals to be Decisions have to be made about the most important goals to be achieved:achieved:

Maximizing the speed of searchMaximizing the speed of search

Maximizing the sophistication of searchMaximizing the sophistication of search

Maximizing specific effects of interest for a given institution or a Maximizing specific effects of interest for a given institution or a customercustomer

Maximizing a combination of the aboveMaximizing a combination of the above

Decision on this level affect the applicability of the final product / Decision on this level affect the applicability of the final product / tool.tool.

Page 68: Genetic Algorithms

6868 / 76 / 76

Algorithmic LevelAlgorithmic Level

Develop an efficient mutation algorithm of interest for the applicationDevelop an efficient mutation algorithm of interest for the application

in the direction of database architecture and designin the direction of database architecture and design

in introducing the elements of semantic-based mutationin introducing the elements of semantic-based mutation

Semantics-based mutations are especially of interest for chaotic Semantics-based mutations are especially of interest for chaotic markets, typical of new markets in developed countries or markets, typical of new markets in developed countries or traditional markets in under-developed countries.traditional markets in under-developed countries.

Page 69: Genetic Algorithms

6969 / 76 / 76

Semantics-based MutationSemantics-based Mutation

Mutation based on spatial localitiesMutation based on spatial localities

After a “fruitful” Web presentation is reached (using a tradicional After a “fruitful” Web presentation is reached (using a tradicional algorithm with mutation), the site of the same Internet service provider algorithm with mutation), the site of the same Internet service provider is searched for other presentations on the same or similar topicis searched for other presentations on the same or similar topic

Explanation :Explanation :

In chaotic markets, it is very unlikely that service/product offers from In chaotic markets, it is very unlikely that service/product offers from the same small geographic area each other on their Web presentationsthe same small geographic area each other on their Web presentations

After a successful “side trip” based on spatial mutation, one continue After a successful “side trip” based on spatial mutation, one continue with the traditional database mutation.with the traditional database mutation.

Page 70: Genetic Algorithms

7070 / 76 / 76

Semantics-based Mutation…Semantics-based Mutation…

Mutation based on temporal localitiesMutation based on temporal localities

One comes back periodically to a Web presentation which was One comes back periodically to a Web presentation which was “fruitful” in the past“fruitful” in the past

One comes back periodically to other Web presentations One comes back periodically to other Web presentations developed by the author who created some “fruitful” Web developed by the author who created some “fruitful” Web presentations in the pastpresentations in the past

Temporal mutation can use direct revisits or a number of indirect Temporal mutation can use direct revisits or a number of indirect forms or revisit.forms or revisit.

Page 71: Genetic Algorithms

7171 / 76 / 76

Implementation LevelImplementation Level

Utilization of novel technologies, for maximal performance and Utilization of novel technologies, for maximal performance and minimal implementation complexityminimal implementation complexity

Important for:Important for:

- good flexibility- good flexibility

- extendibility- extendibility

- reliability- reliability

- availability- availability Utilization of mobile platforms and mobile agentsUtilization of mobile platforms and mobile agents

Page 72: Genetic Algorithms

7272 / 76 / 76

Implementation Level…Implementation Level… Static agentsStatic agents

- one has to download megabytes of information- one has to download megabytes of information- treat that information with a decision-making code of size - treat that information with a decision-making code of size measured in kilobytesmeasured in kilobytes- derive the final business related decision, which is binary in size - derive the final business related decision, which is binary in size (one bit: yes or no)(one bit: yes or no)

A huge amount of data is transferred through the network A huge amount of data is transferred through the network in vain, because only a small percent of fetched documents will in vain, because only a small percent of fetched documents will turn out to be usefulturn out to be useful

Mobile agentsMobile agents- they would browse through the network and perform the search - they would browse through the network and perform the search locally, on the remote servers, transferring only the needed locally, on the remote servers, transferring only the needed documents and datadocuments and data- they load the network only with kilobytes and a single bit- they load the network only with kilobytes and a single bit

Page 73: Genetic Algorithms

7373 / 76 / 76

Simulation ResultSimulation Result

Links-based approach in the static domainLinks-based approach in the static domain How various mutation strategies can affect the search efficiencyHow various mutation strategies can affect the search efficiency Set of software packages have developed , that would perform Set of software packages have developed , that would perform

Internet search using genetic algorithms (by Veljko Milutinovic, Internet search using genetic algorithms (by Veljko Milutinovic, Dragana Cvetkovic, and Jelena Mirkovic)Dragana Cvetkovic, and Jelena Mirkovic)

As the fitness function they have measured average Jaccard’s As the fitness function they have measured average Jaccard’s score for the output documents, while changing the type and rate score for the output documents, while changing the type and rate of mutationof mutation

Page 74: Genetic Algorithms

7474 / 76 / 76

Simulation Result…Simulation Result…

The simulation result for topic The simulation result for topic mutationmutation

The simulation result for temporal The simulation result for temporal and spatial mutation combined and spatial mutation combined with topic mutationwith topic mutation

Page 75: Genetic Algorithms

7575 / 76 / 76

Simulation Result…Simulation Result…

The simulation result for topic, The simulation result for topic, spatial and temporal mutation spatial and temporal mutation combined.combined.

Constant increase in the quality Constant increase in the quality of pages found.of pages found.

Page 76: Genetic Algorithms

7676 / 76 / 76

Conclusion: EvolutionConclusion: Evolution

Tutorial download: galeb.etf.bg.ac.yu/~vm Option:TutorialsTutorial download: galeb.etf.bg.ac.yu/~vm Option:Tutorials