tpam: a simulation-based model for quantitatively ...introduction adaptive des tpam experimental...
TRANSCRIPT
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
TPAM:A Simulation-Based Model for QuantitativelyAnalyzing Parameter Adaptation Methods
Ryoji Tanabe (SUSTech, CHN) and Alex Fukunaga (U. Tokyo, JP)
The first quantitative analysis of PAMs in isolationParameter Adaptation Methods (PAMs) are poorly understoodWe propose a Target function-based PAM simulation (TPAM)
framework for analyzing PAMs in adaptive DETPAM measures the ability of PAMs to track predefined target
parameters and enables quantitative analysis of PAMs
E.g., PAM-JADE tracks this particular target behavior
1.4 times better than PAM-SHADE
1 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
Differential Evolution (DE) [Storn 97]
The two main control parameters of DE are:
1. Scale factor F ∈ (0, 1]:
F controls the magnitude of the differential mutation
2. Crossover rate C ∈ [0, 1]:
C controls the number of inherited variables from x
1 t← 1, initialize the population P t = {x1,t, ...,xN,t};2 while The termination criteria are not met do3 for i ∈ {1, ..., N} do4 vi,t ← differentialMutation(P t, Fi,t);5 ui,t ← crossover(xi,t,vi,t, Ci,t);
6 for i ∈ {1, ..., N} do7 if f(ui,t) ≤ f(xi,t) then xi,t+1 ← ui,t ;8 else xi,t+1 ← xi,t ;
9 t← t+ 1;
2 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
The performance of DE depends on the setting of F and C
Number of function evaluations (FEvals)
Rastrigin (separable) Rosenbrock (nonseparable)
C = 0.1 is suitable
C = 0.9 is unsuitable
C = 0.9 is suitable
C = 0.1 is unsuitable
E.g., An appropriate setting of C depends on separability
The classical DE is not so efficient for black-box optimization
DE needs automated parameter control methods for F and C
A large number of adaptive DE algorithms have been proposed
jDE [Brest 06], JADE [Zhang 09], SaDE [Qin 09],EPSDE [Mallipeddi 11], MDE [Islam 12], SHADE [Tanabe 13], ...
3 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
Parameter Adaptation Methods (PAMs) are poorly understood
We are interested in PAMs in adaptive DE, not adaptive DE
PAM of jDE rand/1 binomial
Mutation strategy
PAM of JADE current-to-pbest/1 binomial
jDE =
JADE =
PAMs (our interest) Cross.
Several previous works have tried to analyze PAMs in adaptive DE
Their results/analysis limited to plots of changes in F and C
Rosenbrock
Rastrigin
Number of iterations
The meta-parameter of C quickly drops down
The meta-parameter of C gradually increases
Analysis limited toqualitative descriptions
Such qualitative results do notprovide sufficient insight
E.g., Adaptation process of the meta-parameter of JADE for C
4 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
Proposal: Target function-based PAM simulation (TPAM)
Traditional analyses based on plotting parameter values:
Limited to qualitative descriptionsMay be useful for analyzing overall behavior of adaptive DEs,but cannot analyze the behavior of PAMs in isolation
Our proposed TPAM:
Quantitatively evaluates the tracking performance of PAMsEnables analysis for PAMs in isolation
Independent of other mechanisms in adaptive DE
Measures the ability of PAMs to track predefined targetparameters and enables quantitative analysis of PAMs
E.g., PAM-JADE tracks this particular target behavior 1.4
times better than PAM-SHADE
5 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
A generalized Parameter Adaptation Method (PAM) in adaptive DE
1. At the beginning of each iteration t, generate Fi,t and Ci,t foreach individual xi,t using meta-parameters
2. Decide whether {Fi,t, Ci,t} is a success or a failure
Success: if the child ui,t is better than the parent xi,t
I.e., if f(ui,t) ≤ f(xi,t)
Failure: Otherwise
3. At the end of each iteration t, update the meta-parametersbased on the success/failure decisions
Strictly speaking:
Some PAMs do not use meta-parameters
E.g., PAM-jDE, PAM-EPSDE,...
But, such PAMs can also be generalized into the aboveframework
6 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
Example: PAM-JADE [Zhang 09]
PAM-JADE uses two meta-parameters µF and µC forparameter adaptation of F and C, respectively
1 t← 1, initialize the population P t, µF , µC ← 0.5;2 while The termination criteria are not met do
3 SF ← ∅, SC ← ∅;4 for i ∈ {1, ..., N} do5 Fi,t ← CauchyRand(µF , 0.1), Ci,t ← NomalRand(µC , 0.1);
6 vi,t ← differentialMutation(P t, Fi,t);7 ui,t ← crossover(xi,t,vi,t, Ci,t);
8 for i ∈ {1, ..., N} do9 if f(ui,t) ≤ f(xi,t) then
10 xi,t+1 ← ui,t, SF ← Fi,t, SC ← Ci,t
11 else xi,t+1 ← xi,t ;
12 µF ← (1− c)µF + c meanA(SF ), µC ← (1− c)µC + c meanL(SC);
13 t← t+ 1;
7 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
Example: PAM-SHADE [Tanabe 13]
PAM-SHADE uses MF = (MF1 , ...,MF
H)T and MC = (MC1 , ...,MC
H )T
1 t← 1, initialize the population P t, MF ,MC ← 0.5, k ← 1;2 while The termination criteria are not met do
3 SF ← ∅, SC ← ∅;4 for i ∈ {1, ..., N} do5 ri,t ← Randi{1, ..., N}, Fi,t ← CauchyRand(MF
ri,t, 0.1), Ci,t ←
NomalRand(MCri,t
, 0.1);
6 vi,t ← differentialMutation(P t, Fi,t);7 ui,t ← crossover(xi,t,vi,t, Ci,t);
8 for i ∈ {1, ..., N} do9 if f(ui,t) ≤ f(xi,t) then
10 xi,t+1 ← ui,t, SF ← Fi,t, SC ← Ci,t
11 else xi,t+1 ← xi,t ;
12 MFk ← meanA(SF ), MC
k ← meanL(SC), k ← mod(k + 1,M);
13 t← t+ 1;8 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
Basic idea of our proposed TPAM: Space reduction
A traditional approach must consider three complex spaces
Control parameter space Solution space Objective function space
ALL previous work used the objective function value of thesolution f(x) to evaluate the generated control parametersSince x is also affected by variation operators, suchapproaches cannot evaluate the control parameters in isolation
PROPOSAL:
Let’s remove the solution space from optimization problems!
The TPAM approach considers only the two simplified spaces
Control parameter space TPAM success/failure space
The solution space is eliminated, and the objective functionspace is replaced by the TPAM success/failure spaceControl parameters can be directly evaluated in isolation 9 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
Some notes on this presentation
Recall: We are NOT interested in “the whole adaptive DE”
We want to focus only on Parameter Adaptation Methods
We are interested in PAM-jDE, not jDE
TPAM is NOT a class of benchmark function for optimization
The TPAM is a simulation framework to analyze PAMs
TPAM does NOT seek to optimize a static objective function f
TPAM measures the target-tracking behavior of a PAMIndividuals in TPAM only have F , C values but no base levelgenome, so variation operators (e.g., mutation) are irrelevant
TPAM can deal with both F and C simultaneously
We investigated (i) C, (ii) F , and (iii) {F,C}But, the tendency of their results is not different
For simplicity, we focus on C in this talk
10 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
The process of PAMs in adaptive DE only depend onwhether each child generation is a success or a failure
Analyzing the PAM behavior does not require modeling the
absolute objective function values of the individualsA simulation model of the success/failure decisions is sufficientParameter adaptation of PAMs can be simulated by using a
surrogate model deciding whether {Fi,t, Ci,t} is a success or not
1 t← 1;2 while The termination criteria are not met do3 SF ← ∅, SC ← ∅;4 for i ∈ {1, ..., N} do5 Generate Fi,t and Ci,t according to meta-parameters;
6 for i ∈ {1, ..., N} do
7 if The pair of Fi,t and Ci,t is successful then8 SF ← Fi,t, S
C ← Ci,t
9 Update the meta-parameters based on SF and SC ;10 t← t+ 1;
11 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
A model of the success/failure decisions in the proposed TPAM
Target parameters θtarget1 , θtarget2 , ... are introduced in TPAM
The dicision is made based on the distance from θtarget
The closer θ is from θtarget, the higher its success probability
Successful parametersFailed parameters
Target parameters
The number of iterations
Success probability depends on two parameters: pmax and d ∈ [0, 1]
A smaller pmax value makes a simulation model difficult
0d: Distance between a generated parameter and the target parameter
p max
Max. probability
Success probability = -d + p max
1
0.5
0.1
12 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
Target parameters θtarget1 , θtarget2 , ... are given by a target function g
1. Two types of the linear functions glin/inc and glin/dec
0 10000 20000 30000 40000 500000.0
0.2
0.4
0.6
0.8
1.0
Target ParametersPAM-JADE’s µ
0 10000 20000 30000 40000 500000.0
0.2
0.4
0.6
0.8
1.0
2. The sinusoidal function gsin (left: ω = 10, right: ω = 40)
0 10000 20000 30000 40000 500000.0
0.2
0.4
0.6
0.8
1.0
0 10000 20000 30000 40000 500000.0
0.2
0.4
0.6
0.8
1.0
3. The random-walk function gran (left: s = 0.04, right: s = 0.1)
0 10000 20000 30000 40000 500000.0
0.2
0.4
0.6
0.8
1.0
0 10000 20000 30000 40000 500000.0
0.2
0.4
0.6
0.8
1.0
13 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
The overall TPAM framework
1 t← 1, initialize a meta-parameter;2 while The termination criteria are not met do3 θtargett ← g(t);
4 Sθ ← ∅;5 for i ∈ {1, ..., N} do6 Generate θi,t according to the meta-parameter;
7 for i ∈ {1, ..., N} do8 if isParametersSuccessful(θi,t, θ
targett ) = successful then
9 Sθ ← θi,t;
10 Update the meta-parameter based on Sθ;11 t← t+ 1;
A parameter θ is (i) C, (ii) F , or (iii) {F,C}
14 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
Experimental settings
Settings for Parameter Adaptation Methods (PAMs)
PAM-jDE, PAM-EPSDE, PAM-JADE, PAM-MDE, PAM-SHADE
For each PAM, the hyperparameters recommended bythe original authors were used
Population size = 50
Settings for the proposed TPAM
Number of parameter sampling steps = 50, 000
To evaluate PAMs over a large window of activityThis does not correspond to any specific number ofsearch steps executed by a DE
101 independent runs. pmax ∈ {0.1, 0.2, ..., 1.0}
Evaluation metric for TPAM:The percentage of successful parameters rsucc ∈ [0, 1]
A higher rsucc represents that the PAM is able to track giventarget parameters θtarget1 , θtarget2 , ... 15 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
The tracking performance of the five PAMs on the linear functions
PAM-JADE has best tracking for all the pmax valuesFor the smaller pmax values, PAM-EPSDE cannot track θtarget
PAM-MDE and PAM-SHADE have a hidden bias
Max. success probability pmax (Smaller is more difficult)
Linearly increasing function
The tracking performance of PAM-JADE is 1.4 times better
than that of PAM-SHADE
Linearly decreasing function
16 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
The tracking performance of the five PAMs on the sinusoidal function
PAM-EPSDE has best tracking for the larger pmax valuesPAM-SHADE has best tracking for the smaller pmax valuesThe tracking performance of PAM-SHADE on hard TPAMinstances is better than that of the other PAMs
2
Max. success probability pmax (Smaller is more difficult)
Angular frequency = 10 (easy) Angular frequency = 40 (hard)
PAM-EPSDE has 1.2 times better tracking than PAM-SHADE
PAM-SHADE has 1.04 times the tracking
performance of PAM-JADE
PAM-SHADE has 1.3 times the tracking performance
of PAM-JADE
17 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
Why does PAM-SHADE track the target better than PAM-JADE?
The diversity of values in M enables PAM-SHADE to be muchmore robust than PAM-JADE on hard TPAM instances
μ of PAM-JADE
M of PAM-SHADE
PAM-JADE clearly fails to track the target parameters
PAM-SHADE succeeds in tracking the target
ω=10
ω=10
ω=40
ω=40
μ of PAM-JADE mostly overlaps the target
ω=10
ω=40
PAM-SHADE tracks the target with broadly distributed values
ω=10
ω=40
18 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
The tracking performance of the PAMs on the random-walk function
Easy instances: PAM-JADE and PAM-EPSDE track wellHard instances: PAM-SHADE has the best tracking performance
Step size of random walk s [001, 0.1]∈
pmax = 1 (easy) pmax = 0.1 (hard)
The tracking performance of PAM-SHADE is 1.18 times
worse than that of PAM-JADE
Larger s harder to track
PAM-SHADE has 1.13 times better tracking
than PAM-JADE
19 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
How relevant are the target tracking accuracy of PAMs to the search
performance of adaptive DEs? Results on the BBOB functions [Hansen 09]
The rand/1/bin operators are used for all PAMsAn adaptive DE using PAM-SHADE performs wellThe tracking performance of a PAM in the TPAM model iscorrelated with search performance of DE using that PAM
0 1 2 3 4 5log10 of (# f-evals / dimension)
0.0
0.2
0.4
0.6
0.8
1.0
Pro
port
ion
offu
nctio
n+ta
rget
pairs
PAM-JADE
PAM-EPSDE
PAM-MDE
PAM-jDE
PAM-SHADE
best 2009f1-24,10-D
20 / 21
Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion
Conclusion:
The first quantitative investigation of the adaptation ability of PAMs
We proposed a Target function-based PAM simulation (TPAM)
TPAM evaluates the tracking performance of PAMs andenables quantitative comparison of PAMs
E.g., PAM-SHADE tracks this particular target behavior 1.3
times better than PAM-JADE
TPAM can provide important insights on PAMs,
E.g., Why an adaptive DE using PAM-SHADE performs well
TPAM is a promising simulation framework for analyzing PAMs
Future work
Apply TPAM to analysis of PAMs in other EAs (e.g., ESs)Design an unified, systematic simulation framework
TPAM evaluates only the tracking performance of PAMsOther simulation models that evaluate other importantaspects of PAMs (e.g., diversity) are necessary
21 / 21
Appendix
Is our definition of success probability appropriate?
All of the parameter values generated by an adaptive DE using thecurrent-to-pbest/1/bin and PAM-JADE on the Rosenbrock
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
Number of function evaluations ×104
0.0
0.2
0.4
0.6
0.8
1.0F
Failed FSuccessful FSpline curve
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
Number of function evaluations ×104
0.0
0.2
0.4
0.6
0.8
1.0C
Failed CSuccessful CSpline curve
Success probability as a function of the distance of the F (left)and C (right) from the spline curve
0.0 0.1 0.2 0.3 0.4 0.5 0.60.000
0.005
0.010
0.015
0.020
0.025
0.030
0.035
0.040
0.045F
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.450.00
0.01
0.02
0.03
0.04
0.05
0.06C
Distance
Suc
cess
prob
abili
ty
The assumption that the probability ofgenerating successful trial vectors ishighly correlated with the ability togenerate control parameters valueswhich accurately track a targetparameter is justifiable
22 / 21