high dimensional model representation for solving ...kdeb/papers/c2016012.pdfcomplex, high...

High Dimensional Model Representation forsolving Expensive Multi-objective

Optimization Problems∗

Proteek Chandan Roy�1 and Kalyanmoy Deb�2

1Department of Computer Science and Engineering, Michigan State University2Department of Electrical and Computer Engineering, Michigan State University

COIN Report Number 2016012

Metamodel based evolutionary algorithms have been used for solving ex-pensive single and multi-objective optimization problems where evaluationof functions consume major portion of the running time. The system can becomplex, high dimensional, multi-objective and black box function. In thispaper, we have proposed a framework for solving expensive multi-objectiveoptimization problems that uses high dimensional model representation (HDMR)as a basic model. The proposed method first explores the region of interestand then exploits them by narrowing the search space. It uses Kriging tointerpolate subcomponents of HDMR and NSGA-II to solve the model space.It is compared with basic NSGA-II and multi-objective efficient global opti-mization (EGO) method on ZDT, DTLZ and CEC09 test problem suits. Theresults show that this framework is able to find a good distribution of solu-tions which are sufficiently converged to Pareto-optimal fronts with limitednumber of solution evaluations.

1 Introduction

Evolutionary algorithms have been used extensively in solving multi-objective optimiza-tion problems. Calculation of objective functions can be carried out by an exact func-tional form, computational simulation or physical experiment. Due to high computa-tional cost of simulation or time consuming nature of experiments, evolutionary algo-rithms are combined with surrogate or computationally cheap models to reduce total

∗Accepted for presentation at WCCI-2016 (25-29 July 2016), Vancouver, Canada.�[email protected]�[email protected]

1

time of evaluations. For example, surrogate models have been applied to aerodynamicdesign [1] or drug design problems [2] to reduce the overall time for optimization.

In literature, different surrogate models are proposed to solve single and multi-objectiveoptimization problems (MOPs). The popular surrogate models include response surfacemodel (RSF) [3], radial basis function (RBF) [4], multivariate adaptive regression splines(MARS) [5], multi-layer perceptron (MLP) [6], Kriging models [7] etc. Jones et al. [8]proposed a Kriging based method called efficient global optimization (EGO) for sin-gle objective problems. Kriging method models the target function and predicts itsvalue along with an uncertainty measure. This value is then used to measure a metriccalled expected improvement (EI), that computes how much improvement the solutionis supposed to make. Emmerich et al. introduced Kriging in multi-objective problemswith expected improvement (EI) and expected hypervolume improvement (EHVI) as aselection criteria [7]. The solutions having maximum EI or EHVI are selected for nextgeneration and evaluated by real fitness function or high-fidelity evaluation. ParEGO [9]solves multi-objective problem by converting them into single objective with parameter-ized scalarizing weight vector. S-metric selection based EGO (SMS-EGO) [10] uses ahypervolume based metric or S-metric to select an individual for exact solution eval-uation. They have used covariance matrix adaptation evolution strategy (CMS-ES)for solving the problems. Decomposition based multi-objective evolutionary algorithm(MOEA/D) [11] has been combined with EGO to evaluate multiple solutions in parallel.EGO method has also been combined with NSGA-II [12] to solve problems having morethan 2 objectives [13]. However, all the Kriging based methods have difficulty in solvingproblems with more than 12 variables. Due to the curse of dimensionality, the num-ber of training samples needed for modeling objective function becomes very high as thenumber of input dimension grows. Therefore, we introduce the model called high dimen-sional model representation that has some advantages in complex and high dimensionalproblems. Other surrogate based evolutionary algorithms can be found in [14].

High dimensional model representation (HDMR) has been introduced by Sobol [15]and Rabitz [16]. This mathematical framework is known for capturing complex input-output relationship with a set of orthogonal basis [17]. Several model fitting approachese.g. Kriging [18], radial basis function (RBF) [19], moving least square [17] etc havebeen used to model the components of HDMR. In [19], RBF is used to predict theHDMR components by generating samples until it finds optimum value. After build-ing the model, an optimization algorithm finds the best predicted solutions or Paretofront in model space. A more comprehensive survey of HDMR based metamodels aregiven in [20]. Recently HDMR has been combined with cooperative co-evolution basedalgorithm for large scale problems [21] where HDMR is used to extract separable andnon-separable subcomponents. In these cases, the optimization is done based on the con-structed model without any explicit use of evolutionary control. This paper introducesHDMR model based evolutionary control (HDMR-EC) for solving expensive, complexand high dimensional multi-objective problems.

Model management is an important part when it comes to expensive multi-objectiveproblems. Since the fitness landscape is not known to the algorithm beforehand, it isvery common to over-fit or under-fit the landscape. Over-fitting might lead to false

2

optima introduced by surrogates whereas under-fitting might overlook some optimumregions. Within limited amount of function evaluations, the algorithm should also focuson the part of search space that are more probable to optimize the objectives. Differentmethods for combining high-fidelity evaluations with the model is discussed in [14]. Themain contribution of this paper is to combine an evolution control or model managementframework to HDMR model to solve complex and expensive MOPs.

The next sections are organized as follows. Section II describes the mathematicalframework of HDMR and discusses the possible places where we can introduce modelmanagement. Section III introduces our proposed method HDMR-EC and discusses itsadvantages. Experimental settings and results for 34 test instances are shown in SectionIV. Section V discusses about the results and possible reasons for the effectiveness ofthe proposed method. Then we conclude the paper with overall remarks, shortcomingsof the method and future work.

2 Background

In this section we briefly discuss about the theory of high dimensional model represen-tation framework that models a complicated high dimensional function.

2.1 High Dimensional Model Representation

High dimensional model representation is a mathematical framework that approximatesa system response without considering the complicated underlying principles. It cap-tures non-linear system outputs as a summation of variable correlations in a hierarchicalorder. In a full model expansion, it considers all possible variable interactions and theircontribution to the original function. The first term describes the average value of thefitness landscape near a reference point called cut-center. Second order terms expressthe independent effects of each variable if decision variables are deviated from the cut-center. Higher order terms denote the residual group effect of variables. Therefore, theterms are independent or orthogonal to each other to form a mean convergent series.The function f(x) can be written as a series expansion in the following manner.

f(x) = g0 +n

∑i=1

gi(xi) +n

∑1≤i<j≤n

gij(xi, xj)+

. . . + g12...n(x1, x2, . . . , xn)

(1)

Here g0 is the mean response in the neighborhood of x in decision variable space and itis constant. gi(xi) gives the independent effect of variable xi on the output f(x) whenacting alone. gij(xi, xj) contributes with pair-wise interaction of variables xi and xj .Note that, this term doesn’t contain any effect of previous terms, i.e. constant effectand effect of single variables. This series approaches in a hierarchical manner up tointeraction of all variables g12...n(x1, x2, . . . , xn). This term gives the residual correlatedbehavior over all decision variables. In order to compute the component functions, weneed a reference point x as a cut-center and a set of sample points in the neighborhood

3

of x to find all the effects. The terms will later be used to predict function values ofunknown points. First three terms can be computed by the following equations.

g0 = g(x) = f(x1, x2, . . . , xn)

gi(xi) = f(x1, x2, . . . , xi, xi+1, . . . , xn) − g0

gij(xi, xj) = f(x1, . . . , xi−1, xi, xi+1, . . . , xj , xj+1, . . . ,xn) − gi(xi) − gj(xj) − g0

(2)

f(x1, x2, . . . , xi, xi+1, . . . , xn) denotes the value of the function f at the point where allthe decision variables are at reference point x except xi is replaced by xi which is a closepoint of xi. By subtracting the lower order terms we can identify the unique effect of thecorresponding variable interaction. First order approximation is given by the equationbelow.

f(x) = g0 + ∑1≤i≤n

gi(xi) +R (3)

Here R is the error term produced by the truncation of the series up to second term.Given reference point x, regularly spaced points are sampled in decision space coordi-nates. Effect of an arbitrary point x

′

can be found with the help of all sampled effectsand an interpolation method. Function values are predicted by component (g) functions.In this study we have used Kriging to interpolate g functions (Fig. 1).

g1(x1)

x1x1

Figure 1: Interpolating linear effects of regularly spaced sample points for variable x1. x1is the first coordinate of n-dimensional reference point x = (x1, x2, . . . , xn). Sixpoints are sampled around x1 and their effects are interpolated with Kriging.

3 The Proposed Algorithm

In the previous section, we considered the case where we can predict the function valueof an unknown solution x′. One important criteria of evaluating x′ is that, it needs tobe within the sampling region (bounded by the samples). If x′ is outside the samplingregion, the model would extrapolate the function value and the prediction would be

4

worse in general. Therefore, special care needs to be taken if we want use HDMR in anevolutionary algorithm. Here we propose a method that optimizes the model within thesearch region bounded by the samples.

Many surrogate based methods convert a multi-objective problem into a single objec-tive one as discussed previously. Here we are proposing a method, namely HDMR-EC,that will retain the multi-objective nature of the problem and use a multi-objectiveproblem solver to solve the constructed model. Since the method is supposed to beapplied for expensive problems, we use only first two terms (Eq. 3) of HDMR to reducenumber of samples needed for modeling. We will use Kriging to model each first orderg functions. Uncertainty measure of the Kriging model is simply added (assuming min-imization problem) to the predicted function value to discourage over-fitting. It meansthat, an optimization algorithm will not over-emphasize some uncertain regions.

There are two main parts of the proposed scheme– 1) Initial Exploration and 2)Exploitation. Each of them consists of more than one sub-steps. The steps are discussedbelow. Here we assume that the optimization problem has n decision variables, mobjectives and FEmax maximum number of solution evaluations.

3.0.1 Initial Exploration Step

In this step the proposed algorithm builds HDMR model with full range of input vari-ables. Initial model is very important in our method because it guides the search towardsoptimal region. It starts by sampling the total space, building the model and optimizingbased on the model.

(a) Sampling: The algorithm starts by taking a cut-center or reference point whosecoordinates are middle values of each decision variable domain. For example, ifx(x1, x2) ∈ [0,1], then the center point will be (0.5,0.5). This initialization is donewithout any knowledge of the problem. It is possible to initialize the cut-centerwith any point other than midpoint. For each input variable xi, the algorithmchanges the values of xi with an interval vector d from lower bound to upper boundof xi. Other variables remain the same. For example, if we take 5 samples for x1variable, the samples would be {(0.0,0.5), (0.25,0.5), (0.5,0.5), (0.75,0.5), (1.0,0.5)}.Here the difference between samples in x1 is d1 = 0.25. For s samples, the intervalvector d = (ub− lb)/(s−1) where ub and lb are the upper bound and lower boundof decision variables. The exploration parameter e of our algorithm signifies theproportion of initial exploration. For instance, e = 0.5 denotes that 50% of the so-lution evaluations will be considered for exploration. The parameter s and initialvalue of d can be derived from e and number of variables n.

(b) Model Building: In this step, we calculate the g functions for each objective.Total number of first order g functions is n (see Eq. 3) for each objective. Simplearithmetic is sufficient to obtain the g functions from objective values. Now webuild Kriging model with those functions. As we have discussed, variables aresampled by changing only one variable at a time, there will be n models each of

5

them consists of only one variable. We can use other interpolation techniques e.g.spline, radial basis function network or moving least square as a model.

(c) Optimization: Using a multi-objective evolutionary algorithm we can find aPareto front predicted by the model. The model would provide the functionvalues of any unknown point within the range. For our experiment, we havechosen NSGA-II algorithm for fair comparison among three different methodolo-gies. NSGA-II it is quite successful in two objective optimization problems [22].However, one can also choose other sophisticated algorithms to improve the per-formance. Here we assume that other components of optimization process takenegligible time compare to fitness evaluations. Standard parameter settings canbe used for the optimization algorithm. Uncertainty measure obtained from Krig-ing is added to the function value while minimization to reduce model over-fitting.

At the end of this step, we would identify the regions that create predicted Paretofront. Initial model might not provide us the optimum point. But it is more probableto find optima if we narrow down the search space. However, this might exclude somepotential search regions. As we will be dealing with expensive problems, number ofsolution evaluations will be limited. The number of search regions can be controlled bythe number of clusters which is discussed in the next step.

3.0.2 Exploitation Step

In this step, the algorithm continues to focus on the search regions to facilitate furtherexploitation.

(a) Model validation: At the end of first step, the algorithm gets a predicted Paretofront. Our goal is to find a well-distributed set of solutions. For this purpose,the algorithm takes K points by clustering those solutions. The algorithm usesfuzzy K-means clustering method in the variable space. It starts with K numberof points placed randomly on the search space and assign predicted points toeach cluster with random weights or degree of belongings. Then we computethe centroids of the clusters and update degree of belongings until convergenceor stopping criteria is met. We estimate the fitness values of cluster centers byexpensive evaluations. It is possible that some of them are dominated by theothers. Dominated points will not be considered for for further exploitation. Its isdenoted as model validation. Therefore, it is possible that the algorithm exploitsless than K regions in each iteration. The reason for model validation is to avoidfalse optima.

(b) Sampling, Model Building and Optimization: Valid or seemingly optimalpoints from the previous step are used as cut-centers of HDMR and sampled againwith reduced interval of d (see Fig. 2). This reduction is called learning rate l.This parameter denotes the proportion of change in intervals and it is given by theuser. l = 0.2 indicates that upper and lower bound is shrunk by 2% of previous

6

bound. Therefore, d ← 0.8 × d for l = 0.2. We build the models for each objectiveand run NSGA-II algorithm separately to find a combined predicted Pareto front.We follow step (a) and (b) until it exceeds maximum number of evaluations FEmax.In exploitation step we fix our sampling size s = 5. We can increase the samplingsize if total number of solution evaluations are increased. The authors have testedperformance of this algorithm with s = 3,5,7,9,11 etc which will not be reported inthis paper. In general, larger sampling size increases the model prediction accuracy.

(c) Output: To provide a well distributed Pareto front to the user, the algorithmscan evaluate few more points from the last predicted front. We can fix the numberof evaluations to be done at the end. These solutions are chosen by the sameclustering method in order to find a diverse solution set. Only the Pareto set ofsolutions are given as output.

x2

x1

d1

d1

Figure 2: After exploration step, two good regions are identified (K = 2). The searchdomain of those regions are narrowed.

HDMR model approximates the whole fitness space by smoothing the original functionin exploration step. Thus it finds multiple good regions in terms of function values. Theregions will be exploited in the successive steps. Some properties of HDMR-EC methodare described below.

� The metamodeling part is very efficient in HDMR-EC method. Building severalKriging models with only one variable is less time consuming compare to buildingthe model with several variables. For that reason, model parameters can also beoptimized very efficiently. We have used fuzzy K-means clustering method that

7

Algorithm 1: HDMR-EC

Data: Number of variables n and objectives m, Total solution evaluationFEmax, Exploration parameter e, Number of clusters K, learning rate l,upper and lower bound of variables ub and lb, a multi-objectiveevolutionary algorithm A

Result: Solution set P1 s1 = FEmax × e;// Initial population size

2 s2 = ⌊ s1n ⌋;// samples per variable

3 c← (ub − lb)/2;// cut-center

4 Evaluate c and calculate mean response gj0 at c for each objectivej ∈ {1,2, . . . ,m};

5 d← (ub − lb)/(s2 − 1);// calculate d

6 for i = 1 to n do // for each variable

7 P ← P ∪ {Sample around xi with interval d};

8 Calculate subcomponent gji (xi) for each objective j ∈ {1,2, . . . ,m};

9 Build surrogate models with gji ;

10 end11 F = ∣P ∣;// total solution evaluation

12 Q1 ←Predict Pareto front by A;13 Q1 ←Find K cluster centers from Q1 and evaluate them;14 P ← P ∪Q1;15 while F < FEmax do16 Q1 ← ∅;// predicted Pareto front

17 d← (1 − l) × d;// update d

18 s3 ← ⌊FEmax−∣F ∣

n×K ⌋;// samples per variable per cluster

19 if s3 > 1 then // more than one sample per variable

20 for k = 1 to K do // for each cluster

21 for i = 1 to n do // for each variable

22 F ← F + s3;// total evaluations

23 Sample around xi with interval d;

24 Calculate subcomponent gji (xi) ∀j ∈ {1,2, . . . ,m};

25 Build surrogate models with gji ;

26 end27 Q2 ←Predict Pareto front by A;28 Q1 ← Q1 ∪Q2;

29 end30 Q1 ← Find Pareto-optimal set from Q1;31 Q1 ← Find K cluster centers from Q1 and evaluate them;32 P ← P ∪Q1;

33 else34 s4 ← function evaluations left;35 F ← F + s4;36 if s4 > 0 & Q1 ≠ ∅ then37 Q1 ← Find s4 cluster centers from Q1 and evaluate;38 P ← P ∪Q1;

39 end

40 end

41 end42 P ←Find Pareto-optimal set from P ;43 return P;

8

also converges fast. Space complexity increases linearly with the number of inputdimensions because of linear increase of model parameters.

� First order HDMR method can predict both separable and non-separable function.The nature of interpolated g functions (Fig. 1) can be non-linear and sum of theireffects might be highly non-linear. Thus it can handle complex function. We arerelying on an optimization algorithm to predict Pareto optimal front in a highlynon-linear multi-dimensional model space thus reducing the computational cost ofevaluating real fitness functions.

� If we fix the order of HDMR, number of samples needed to represent a functionincreases polynomially with the number of input dimensions [17]. But the numberof terms in HDMR is exponential to the number of input variables thus limits theefficacy of this model. In practical, higher order variable interactions are rare [20].

4 Experimental Results

The proposed framework is compared with basic NSGA-II [12] and Kriging based NSGA-II algorithm [13] in 17 test problems known as ZDT{1,2,3,4,6} [23], DTLZ{1,2,3,4,7} [24],and CEC09{1,2,3,4,5,6,7} [25] with 15 and 30 variables and two objectives. Here Krigingbased NSGA-II, namely M-KRIG is used because it models each function separately (noscalarization) and uses NSGA-II algorithm to optimize the model space similar to theproposed method. It is also necessary to see how a basic algorithm i.e. NSGA-II performson the same problem set. Further analysis and comparison with other methods is left asfuture work.

Maximum FEmax = 500 number of solution evaluations are allowed for comparingeach algorithms. Exploration parameter is taken e = 0.3 for all test instances. It denotesthat at most 0.3 × 500 = 150 solution evaluations are allowed for initial explorations.This number of kept same for initialization of algorithm M-KRIG. Number of samplesfor exploitation is kept s = 5. Number of clusters is kept K = 2. For fuzzy K-meansclustering, we have used MATLAB’s built-in function fcm with stopping criteria 100iterations. The learning parameter l is kept 0.2. This parameter is chosen in adhocmanner. For final output, 10 samples are evaluated from the last predicted front forboth HDMR-EC and M-KRIG algorithm. For DTLZ problems, cut-centers are shiftedto xi = 0.3 in variable space for fair comparison.

Parameter settings of Kriging based EGO (M-KRIG) is taken from [13]. Estimation(EST), which is predicted by the Kriging model, is used as the selection criteria ofNSGA-II algorithm. Initially 150 solutions are generated by latin hypercube model andone solution is picked in each generation that maximizes EST. Model parameters areoptimized while building the model. Source code of Kriging is taken from [26]. For basicNSGA-II algorithm, we use population size 25 with 20 generations. The populationsize is kept limited to maximize the evolution effect. Simulated Binary Crossover andpolynomial mutation [22] is used with probabilities 0.9 and (1/number of variable) re-spectively. Crossover index and mutation index is kept 10 and 20 respectively. Inverted

9

0.20.40.60.8

H K N

ZDT1−15

00.5

1

H K N

ZDT1−30

00.5

11.5

H K N

ZDT2−15

0

1

2

H K N

ZDT2−30

0.20.40.6

H K N

ZDT3−15

0.20.40.6

H K N

ZDT3−30

0

50

100

H K N

ZDT4−15

0100200300

H K N

ZDT4−30

0246

H K N

ZDT6−15

02468

H K N

ZDT6−30

0100200300

H K N

DTLZ1−15

0200400600800

H K N

DTLZ1−30

0

0.1

0.2

H K N

DTLZ2−15

00.20.40.60.8

H K N

DTLZ2−30

0200400600800

H K N

DTLZ3−15

0

1000

2000

H K N

DTLZ3−30

00.5

11.5

H K N

DTLZ4−15

1

1.5

H K N

DTLZ4−30

123

H K N

DTLZ7−15

24

H K N

DTLZ7−30

0.2

0.4

0.6

H K N

UF1−15

0.20.40.60.811.21.4

H K N

UF1−30

0.140.160.18

0.20.22

H K N

UF2−15

0.20.30.4

H K N

UF2−30

0.5

1

H K N

UF3−15

0.40.60.8

H K N

UF3−30

0.1

0.15

H K N

UF4−15

0.120.140.160.18

H K N

UF4−30

024

H K N

UF5−15

12345

H K N

UF5−30

123

H K N

UF6−15

246

H K N

UF6−30

0.40.60.8

H K N

UF7−15

0.40.60.8

11.21.4

H K N

UF7−30

IGD

Figure 3: IGD values of HDMR-EC (H), multi-objective Kriging (K) and NSGA-II (N)algorithm on different problems containing 15 and 30 variables.

generational distance (IGD) and hypervolume metric (HV) [22] is used for performancecomparison. HV is calculated by taking the nadir (worst) point of all dataset in thattest problem. Each test problem is run 10 times in Intel Xeon E5-2697 v3 processor with64-bit Windows machine.

5 Discussion

Test problems considered here have different properties such as non-separable variables,discrete or disconnected Pareto front or complicated variable space. IGD metric ofFig. 3 shows that HDMR-EC performs better in most of the ZDT and DTLZ problemsof 15 and 30 variables compare to multi-objective Kriging (M-KRIG) and NSGA-II.HDMR-EC tends to perform much better than other methods in complicated decision

10

2

4

H K N

ZDT1−15

246

H K N

ZDT1−30

33.5

4

H K N

ZDT2−15

2

4

H K N

ZDT2−30

024

H K N

ZDT3−15

2.53

3.54

4.5

H K N

ZDT3−30

050

100150

H K N

ZDT4−15

0100200

H K N

ZDT4−30

0246

H K N

ZDT6−15

0246

H K N

ZDT6−30

4

6x 10

5

H K N

DTLZ1−15

1

2

3x 10

6

H K N

DTLZ1−30

123

H K N

DTLZ2−15

2468

101214

H K N

DTLZ2−30

123

x 106

H K N

DTLZ3−15

4681012

14x 10

6

H K N

DTLZ3−30

−0.50

0.51

1.5

H K N

DTLZ4−15

0123

H K N

DTLZ4−30

2468

H K N

DTLZ7−15

2468

H K N

DTLZ7−30

2468

101214

H K N

UF1−15

2468

H K N

UF1−30

1

1.5

H K N

UF2−15

1

2

H K N

UF2−30

0123

H K N

UF3−15

0123

H K N

UF3−30

0.40.45

0.5

H K N

UF4−15

0.4

0.5

H K N

UF4−30

102030

H K N

UF5−15

102030

H K N

UF5−30

204060

H K N

UF6−15

204060

H K N

UF6−30

0246

H K N

UF7−15

246

H K N

UF7−30

HV

Figure 4: Hypervolume (HV) values of HDMR-EC (H), multi-objective Kriging (K) andNSGA-II (N) algorithm on different problems containing 15 and 30 variables.

11

variable domain as we have assumed before. In each of the cases, HDMR-EC has loweststandard deviation of IGD values and NSGA-II has the highest one. HDMR-EC haslarger deviation from mean when applied to DTLZ4, DTLZ7, UF2 and UF4. There areno significant differences between 15 and 30 variables problem instances for HDMR-ECmethod. In the later case, more function evaluations are needed. On the other hand,performance deteriorates for M-KRIG method when dimensionality of input variableincreases. Excluding some cases of DTLZ7 and UF5 problems, HDMR-EC gives thehighest hypervolume (Fig. 4) for most of the problems. M-KRIG and NSGA-II performssimilar to HDMR-EC in ZDT1, DTLZ2, UF1 and UF6 problems. M-KRIG does worsein ZDT6, DTLZ1, UF2, UF4, UF5 and UF6 compare to NSGA-II algorithm.

Fig. 5 shows the distribution of solutions of 10 runs. We have computed Pareto set bycombining the data of all runs. It is clear from the figures that, M-KRIG couldn’t con-verge to optimal front within limited evaluations. HDMR-EC finds part of the optimalfront even in 30 variable case. As number of clusters K = 2, HDMR-EC finds only twoparts of the disconnected Pareto optimal regions in ZDT3 and DTLZ7. For complicatedUF{3,4,5,6,7} test problems, we should increase number of solution evaluations to makeHDMR-EC converge to optimal front.

Fig. 6 shows the parameter sensitivity of the proposed method. For three differentsettings of K and e, the algorithm is run for 10 times. These are H1 (K = 1, e = 0.3),H2 (K = 2, e = 0.4), and H3 (K = 2, e = 0.5). We see from the figure that, there is nosignificant difference in terms of IGD values. In some cases, 1-cluster approach tendsto perform better because it facilitates more solution evaluations for exploitation, while2-cluster approach tries to improve IGD by having a good distribution.

6 Conclusion

In this paper, we have successfully integrated high dimensional model representation(HDMR) as a basic model of evolutionary algorithm for solving expensive multi-objectiveoptimization problems. HDMR-EC gives a good prediction of a complex function witha reasonable number of training samples. We have applied our model to 34 test prob-lem instances with maximum of 500 high-fidelity solution evaluations. As seen from theresults, we need to increase number of evaluations in case of 30 variables and compli-cated problems. Cluster based technique is incorporated with HDMR-EC to find a gooddistribution in two objective space. For many objective problems, reference point basedapproaches [27] can be better choice. Kriging based prediction gives us the uncertaintymeasure of subcomponents. Uncertainty measure reduces model over-fitting which isessential for avoiding false optima. For complicated problems, it might not be suitableto have only first order components. However, number of subcomponents of the modelincreases exponentially with input variables which is a limitation of the approach. In theexploitation step, it is possible that the cluster centroids and their ranges overlap withone another. Exploration parameter and number of clusters can be made adaptive andproblem dependent. Other criteria such as expected improvement or expected hypervol-ume improvement can be used in the Kriging model. In future, we would like to extend

12

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

Function 1

Fun

ctio

n 2

Pareto−optimalHDMR−ECM−KRIGNSGA−II

(a) ZDT1, 15 variables

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

4

Function 1

Fun

ctio

n 2


(b) ZDT1, 30 variables

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6

Function 1

Fun

ctio

n 2


(c) ZDT2, 15 variables

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

Function 1

Fun

ctio

n 2


(d) ZDT2, 30 variables

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−1

−0.5

0

0.5

1

1.5

2

2.5

Function 1

Fun

ctio

n 2


(e) ZDT3, 15 variables

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−1

0

1

2

3

4

5

Function 1

Fun

ctio

n 2


(f) ZDT3, 30 variables

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6

7

Function 1

Fun

ctio

n 2


(g) ZDT6, 15 variables

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

2

4

6

8

10

12

Function 1

Fun

ctio

n 2


(h) ZDT6, 30 variables

0 100 200 300 400 500 6000

100

200

300

400

500

600

Function 1

Fun

ctio

n 2


(i) DTLZ1, 15 variables

0 200 400 600 800 1000 1200 1400 1600 18000

200

400

600

800

1000

1200

1400

Function 1

Fun

ctio

n 2


(j) DTLZ1, 30 variables

−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.40

0.2

0.4

0.6

0.8

1

1.2

1.4

Function 1

Fun

ctio

n 2


(k) DTLZ2, 15 vari-ables

−0.5 0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Function 1

Fun

ctio

n 2


(l) DTLZ2, 30 variables

0 0.5 1 1.5 2 2.50

1

2

3

4

5

6

7

Function 1

Fun

ctio

n 2


(m) DTLZ7, 15 vari-ables

0 0.5 1 1.5 2 2.50

1

2

3

4

5

6

7

8

9

Function 1

Fun

ctio

n 2


(n) DTLZ7, 30 vari-ables

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Function 1

Fun

ctio

n 2


(o) UF1, 15 variables

0 0.5 1 1.5 2 2.50

0.5

1

1.5

2

2.5

Function 1

Fun

ctio

n 2


(p) UF1, 30 variables

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

1.2

1.4

Function 1

Fun

ctio

n 2


(q) UF2, 30 variables

0 0.2 0.4 0.6 0.8 1 1.2 1.40

0.2

0.4

0.6

0.8

1

1.2

1.4

Function 1

Fun

ctio

n 2


(r) UF4, 30 variables

0 1 2 3 4 5 6 7 80

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Function 1

Fun

ctio

n 2


(s) UF5, 30 variables

0 1 2 3 4 5 60

2

4

6

8

10

12

Function 1

Fun

ctio

n 2


(t) UF6, 30 variables

Figure 5: Distribution of solutions on different test instances after 500 solution evalu-ations by HDMR-EC, multi-objective Kriging (M-KRIG) and NSGA-II algo-rithms.

13

0.140.160.18

H1 H2 H3

ZDT1−15

0.050.1

0.150.2

H1 H2 H3

ZDT1−30

0.10.12

H1 H2 H3

ZDT2−15

0.10.20.3

H1 H2 H3

ZDT2−30

0.180.2

0.220.24

H1 H2 H3

ZDT3−15

0.180.2

0.22

H1 H2 H3

ZDT3−30

0.0120.0140.016

H1 H2 H3

ZDT4−15

0.0150.02

0.025

H1 H2 H3

ZDT4−30

00.20.4

H1 H2 H3

ZDT6−15

0.10.20.3

H1 H2 H3

ZDT6−30

6.66.877.27.47.6

x 10−3

H1 H2 H3

DTLZ1−15

0.005

0.01

0.015

H1 H2 H3

DTLZ1−30

68

10

x 10−3

H1 H2 H3

DTLZ2−15

6

8

x 10−3

H1 H2 H3

DTLZ2−30

0.0150.0160.017

H1 H2 H3

DTLZ3−15

0.02

0.04

H1 H2 H3

DTLZ3−30

00.20.40.60.8

H1 H2 H3

DTLZ4−15

0.930.935

0.94

H1 H2 H3

DTLZ4−30

0.50.55

0.60.65

H1 H2 H3

DTLZ7−15

0.40.50.6

H1 H2 H3

DTLZ7−30

0.20.30.4

H1 H2 H3

UF1−15

0.10.20.3

H1 H2 H3

UF1−30

0.080.1

0.120.140.16

H1 H2 H3

UF2−15

0.1550.16

0.1650.17

H1 H2 H3

UF2−30

0.40.6

H1 H2 H3

UF3−15

0.30.320.340.360.38

H1 H2 H3

UF3−30

0.1050.11

0.1150.12

H1 H2 H3

UF4−15

0.120.130.14

H1 H2 H3

UF4−30

0.20.40.60.811.21.4

H1 H2 H3

UF5−15

123

H1 H2 H3

UF5−30

0.81

1.21.41.61.8

H1 H2 H3

UF6−15

0.60.8

11.21.4

H1 H2 H3

UF6−30

0.4

0.6

H1 H2 H3

UF7−15

0.40.50.6

H1 H2 H3

UF7−30

IGD

Figure 6: IGD values of three different settings H1 (K = 1, e = 0.3), H2 (K = 2, e = 0.4),and H3 (K = 2, e = 0.5) of HDMR-EC method is shown.

14

this model to expensive single and multi-objective constrained optimization problems.

Acknowledgment

This material is based in part upon work supported by the National Science Foundationunder Cooperative Agreement No. DBI-0939454. Any opinions, findings, and conclu-sions or recommendations expressed in this material are those of the author(s) and donot necessarily reflect the views of the National Science Foundation.

References

[1] Y. Jin and B. Sendhoff, “A systems approach to evolutionary multiobjective struc-tural optimization and beyond,” Computational Intelligence Magazine, IEEE, vol. 4,no. 3, pp. 62–76, Aug 2009.

[2] D. Douguet, “e-lea3d: a computational-aided drug design web server,” NucleicAcids Research, vol. 38, no. suppl 2, pp. W615–W621, 2010.

[3] Y. Lian and M.-S. Liou, “Multiobjective optimization using coupled response surfacemodel and evolutionary algorithm,” AIAA Journal, vol. 43, no. 6, pp. 1316–1325,Jun. 2005.

[4] H. Fang and M. H. Masoud Rais-Rohani, “Multiobjective crashworthiness optimiza-tion with radial basis functions,” in 10th AIAA/ISSMO Multidisciplinary Analysisand Optimization Conference, August 2004.

[5] J. H. Friedman, “Multivariate adaptive regression splines,” Annals of Statistics,vol. 19, pp. 1–67, 1991.

[6] Y. Jin, M. Olhofer, and B. Sendhoff, “A framework for evolutionary optimizationwith approximate fitness functions,” Evolutionary Computation, IEEE Transactionson, vol. 6, no. 5, pp. 481–494, Oct 2002.

[7] M. Emmerich, K. Giannakoglou, and B. Naujoks, “Single- and multiobjective evolu-tionary optimization assisted by gaussian random field metamodels,” EvolutionaryComputation, IEEE Transactions on, vol. 10, no. 4, pp. 421–439, Aug 2006.

[8] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of ex-pensive black-box functions,” J. of Global Optimization, vol. 13, no. 4, pp. 455–492,Dec. 1998.

[9] J. Knowles, “Parego: a hybrid algorithm with on-line landscape approximation forexpensive multiobjective optimization problems,” Evolutionary Computation, IEEETransactions on, vol. 10, no. 1, pp. 50–66, Feb 2006.

15

[10] W. Ponweiser, T. Wagner, D. Biermann, and M. Vincze, “Multiobjective optimiza-tion on a limited budget of evaluations using model-assisted S-metric selection,” inProceedings of the 10th International Conference on Parallel Problem Solving fromNature: PPSN X. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 784–794.

[11] Q. Zhang, W. Liu, E. Tsang, and B. Virginas, “Expensive multiobjective optimiza-tion by moea/d with gaussian process model,” Evolutionary Computation, IEEETransactions on, vol. 14, no. 3, pp. 456–474, June 2010.

[12] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjectivegenetic algorithm: Nsga-ii,” Evolutionary Computation, IEEE Transactions on,vol. 6, no. 2, pp. 182–197, Apr 2002.

[13] C. Luo, K. Shimoyama, and S. Obayashi, “Kriging model based many-objectiveoptimization with efficient calculation of expected hypervolume improvement,” inEvolutionary Computation (CEC), 2014 IEEE Congress on, July 2014, pp. 1187–1194.

[14] Y. Jin, “Surrogate-assisted evolutionary computation: Recent advances and futurechallenges,” Swarm and Evolutionary Computation, vol. 1, no. 2, pp. 61–70, 2011.

[15] I. M. Sobola, “Global sensitivity indices for nonlinear mathematical models andtheir monte carlo estimates,” Math. Comput. Simul., vol. 55, no. 1-3, pp. 271–280,Feb. 2001.

[16] H. Rabitz and d. F. Alis, “General foundations of high-dimensional model repre-sentations,” Journal of Mathematical Chemistry, vol. 25, pp. 197–233, 1999.

[17] R. Chowdhury and S. Adhikari, “High dimensional model representation for stochas-tic finite element analysis,” Applied Mathematical Modelling, vol. 34, no. 12, pp.3917 – 3932, 2010.

[18] S. Ulaganathan, I. Couckuyt, T. Dhaene, J. Degroote, and E. Laermans, “Highdimensional kriging metamodelling utilising gradient information,” Applied Mathe-matical Modelling, 2015.

[19] S. Shan and G. G. Wang, “Metamodeling for high dimensional simulation-baseddesign problems,” Journal of Mechanical Design, vol. 132, p. 051009, 2010.

[20] S. Shan and G. Wang, “Survey of modeling and optimization strategies to solve high-dimensional design problems with computationally-expensive black-box functions,”Structural and Multidisciplinary Optimization, vol. 41, no. 2, pp. 219–241, 2009.

[21] S. Mahdavi, M. Shiri, and S. Rahnamayan, “Cooperative co-evolution with a newdecomposition method for large-scale optimization,” in Evolutionary Computation(CEC), 2014 IEEE Congress on, July 2014, pp. 1285–1292.

[22] K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms. New York,NY, USA: John Wiley & Sons, Inc., 2001.

16

[23] E. Zitzler, K. Deb, and L. Thiele, “Comparison of Multiobjective EvolutionaryAlgorithms: Empirical Results,” Evolutionary Computation, vol. 8, no. 2, pp. 173–195, 2000.

[24] K. Deb, L. Thiele, M. Laumanns, and E. Zitzler, “Scalable multi-objective optimiza-tion test problems,” in Evolutionary Computation, 2002. CEC ’02. Proceedings ofthe 2002 Congress on, vol. 1, May 2002, pp. 825–830.

[25] Q. Zhang, S. Z. A. Zhou, W. L. P. N. Suganthan, and S. Tiwari, “Multiobjec-tive optimization test instances for the cec 2009 special session and competition,”University of Essex, Tech. Rep., apr 2009.

[26] S. N. Lophaven, H. B. Nielsen, and J. Søndergaard, “Aspects of the MATLABtoolbox DACE,” Technical University of Denmark, Copenhagen, Technical ReportIMM-TR-2002-13, 2002.

[27] K. Deb and H. Jain, “An evolutionary many-objective optimization algorithm us-ing reference-point-based nondominated sorting approach, part i: Solving problemswith box constraints,” Evolutionary Computation, IEEE Transactions on, vol. 18,no. 4, pp. 577–601, Aug 2014.

17

high dimensional model representation for solving ...kdeb/papers/c2016012.pdfcomplex, high...

Documents