multi-criteria optimization using the amalgam...

53
AMALGAM MANUAL Multi-criteria Optimization Using the AMALGAM Software Package: Theory, Concepts, and MATLAB Implementation Jasper A. Vrugt a,b a Department of Civil and Environmental Engineering, University of California Irvine, 4130 Engineering Gateway, Irvine, CA 92697-2175 b Department of Earth System Science, University of California Irvine, Irvine, CA Abstract The evolutionary algorithm AMALGAM implements the novel concept of adaptive multimethod search to ensure a fast, reliable and computationally efficient solution to multiobjective optimization problems. The method finds a well-distributed set of Pareto solutions within a single optimization run, and achieves an excellent performance compared to commonly used methods such as SPEA2, NSGA-II and MOEA/D. In this paper, I review the basic ele- ments of AMALGAM, provide a pseudo-code of the algorithm, and introduce a MATLAB toolbox which provides scientists and engineers with an arsenal of options and utilities to solve multiobjective optimization problems involv- ing (among others) multimodality, high-dimensionality, bounded parameter spaces, dynamic simulation models, and distributed multi-core computation. The AMALGAM toolbox supports parallel computing to permit inference of CPU-intensive system models, and provides convergence diagnostics and graphical output. Four different case studies are used to illustrate the main capabilities and functionalities of the AMALGAM toolbox. Keywords: Evolutionary search, Multiple objectives, Global optimization, Pareto front, Convergence analysis, Prior distribution, Residual analysis, Environmental modeling, Multi-processor distributed computation Email address: [email protected] (Jasper A. Vrugt) URL: http://faculty.sites.uci.edu/jasper (Jasper A. Vrugt), http://scholar.google.com/citations?user=zkNXecUAAAAJ&hl=en (Jasper A. Vrugt) Preprint submitted to Manual March 9, 2015

Upload: vohanh

Post on 07-Mar-2019

220 views

Category:

Documents


0 download

TRANSCRIPT

AM

ALG

AM

MA

NU

AL

Multi-criteria Optimization Using the AMALGAMSoftware Package: Theory, Concepts, and MATLAB

Implementation

Jasper A. Vrugta,b

aDepartment of Civil and Environmental Engineering, University of California Irvine,4130 Engineering Gateway, Irvine, CA 92697-2175

bDepartment of Earth System Science, University of California Irvine, Irvine, CA

Abstract

The evolutionary algorithm AMALGAM implements the novel concept ofadaptive multimethod search to ensure a fast, reliable and computationallyefficient solution to multiobjective optimization problems. The method findsa well-distributed set of Pareto solutions within a single optimization run,and achieves an excellent performance compared to commonly used methodssuch as SPEA2, NSGA-II and MOEA/D. In this paper, I review the basic ele-ments of AMALGAM, provide a pseudo-code of the algorithm, and introducea MATLAB toolbox which provides scientists and engineers with an arsenalof options and utilities to solve multiobjective optimization problems involv-ing (among others) multimodality, high-dimensionality, bounded parameterspaces, dynamic simulation models, and distributed multi-core computation.The AMALGAM toolbox supports parallel computing to permit inferenceof CPU-intensive system models, and provides convergence diagnostics andgraphical output. Four different case studies are used to illustrate the maincapabilities and functionalities of the AMALGAM toolbox.Keywords: Evolutionary search, Multiple objectives, Global optimization,Pareto front, Convergence analysis, Prior distribution, Residual analysis,Environmental modeling, Multi-processor distributed computation

Email address: [email protected] (Jasper A. Vrugt)URL: http://faculty.sites.uci.edu/jasper (Jasper A. Vrugt),

http://scholar.google.com/citations?user=zkNXecUAAAAJ&hl=en (Jasper A. Vrugt)

Preprint submitted to Manual March 9, 2015

AM

ALG

AM

MA

NU

AL

1. Introduction and Scope

Evolutionary optimization is a subject of intense interest in many fields2

of study, including computational chemistry, biology, bioinformatics, eco-nomics, computational science, geophysics, and environmental science (Hol-4

land, 1975; Bounds, 1987; Barhen et al., 1997; Wales and Scheraga, 1999;Lemmon and Milinkovitch, 2002; Glick et al., 2002; Nowak and Sigmund,6

2004; Schoups et al., 2005). The goal is to determine values for model param-eters or state variables that provide the best possible solution to a predefined8

cost or objective function, or a set of optimal tradeoff values in the case oftwo or more conflicting objectives. However, locating optimal solutions often10

turns out to be painstakingly tedious, or even completely beyond current orprojected computational capacity (Achlioptas et al., 2005).12

Here, we consider multiobjective optimization problems, with d decisionvariables (parameters) and m (m > 1) objectives: F = {f1(x), . . . , fm(x)},14

where x = {x1, . . . , xd} denotes the decision vector, and F is the objectivespace. We restrict attention to optimization problems in which the parameter16

search space X, although perhaps quite large, is bounded: x ∈ X ∈ Rd. Thepresence of multiple objectives in an optimization problem gives rise to a set18

of Pareto-optimal solutions, instead of a single optimal solution. A Pareto-optimal solution is one in which one objective cannot be further improved20

without causing a simultaneous degradation in at least one other objective.As such, they represent globally optimal solutions to the tradeoff problem.22

In mathematical terms, a multi-objective optimization problem can beformulated as24

arg minx∈X

F(x) = {f1(x), . . . , fm(x)} (1)

where f(·) is a function (numerical model) that computes m (m ≤ 2) dif-ferent objective function values, x denotes a d-vector with decision variables26

(parameters), and X represents the feasible search space.1 The solution tothis problem will in general, no longer be a single ’best’ parameter value28

but will consist of a Pareto set P(x) of solutions corresponding to varioustrade-offs among the m objectives. The Pareto set of solutions defines the30

minimum uncertainty in the parameters that can be achieved without statinga subjective relative preference for minimizing one specific component of F(·)32

1Notation in Equation 1 assumes minimization problems. For maximization problems,please use −f(x) .

2

AM

ALG

AM

MA

NU

AL

at the expense of another. Without additional subjective preference infor-mation, all Pareto optimal solutions are considered equally good (as vectors34

cannot be ordered completely).To illustrate Pareto optimality, please consider Figure 1 which displays36

the sampled parameter (left) and objective function (right) space of a simpletoy problem involving m = 2 conflicting objectives38

arg minx∈X

F(x) ={f1(x) = x1 + (x2 − 1)2

f2(x) = x2 + (x1 − 1)2,(2)

where x ∈ [0, 1]2. The aim is to simultaneously minimize the two objectives{f1, f2} with respect to the two parameters {x1, x2}. The individual points40

A and B minimize objectives f1 and f2, respectively, whereas the dotted blueline joining A and B represents the Pareto optimal front. Moving along the42

dotted line from A to B results in the improvement of f2 while simultane-ously causing deterioration in f1. The points falling on the dotted AB line44

represent the trade-offs between the objectives and are called nondominated,noninferior, or efficient solutions. All the other points of the search domain46

are hence called inferior, dominated, or inefficient.

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1(A)A

B

0 0.4 0.8 1.2 1.6 20

0.4

0.8

1.2

1.6

2 (B)A

B

Figure 1: Illustration of the concept of Pareto optimality for the toy problem withtwo parameters {x1, x2} and two criteria {f1, f2} in the (A) parameter and (B)objective function space. The points A and B indicate the solutions that minimizeeach of the individual criteria f1 and f2. The dotted blue line joining A and Bcorresponds to the Pareto set of solutions. The point β is an element of the solutionset, and is superior in the multicriteria sense to any other point not on this line.

3

AM

ALG

AM

MA

NU

AL

Numerous approaches have been proposed to efficiently find Pareto-optimal48

solutions for multiobjective optimization problems (Zitzler and Thiele, 1999;Zitzler et al., 2000; Deb et al., 2002; Knowles and Corne, 1999). In partic-50

ular, evolutionary algorithms have emerged as the most powerful approachfor solving search and optimization problems involving multiple conflicting52

objectives. Beyond their ability to search intractably large spaces for multi-ple Pareto-optimal solutions, these algorithms are able to maintain a diverse54

set of solutions and exploit similarities of solutions by recombination. Theseattributes lead to efficient convergence to the Pareto-optimal front in a single56

optimization run (Zitzler et al., 2000). Of these, the SPEA/SPEA2 Zitzlerand Thiele (1999); Zitzler et al. (2001), nondominated sorted genetic algo-58

rithm II (NSGA-II) (Deb et al., 2002), and MOEA/D (Zhang and Li, 2007;Li and Zhang, 2009) algorithms have received most attention because of their60

demonstrated ability to solve difficult benchmark problems.Although the multiobjective optimization problem has been studied quite62

extensively, current available evolutionary algorithms typically implement asingle algorithm for population evolution. Reliance on a single biological64

model of natural selection and adaptation presumes that a single methodexists that efficiently evolves a population of potential solutions through the66

parameter space. However, existing theory and numerical experiments havedemonstrated that it is impossible to develop a single algorithm for pop-68

ulation evolution that is always efficient for a diverse set of optimizationproblems (Wolpert and Macready, 1997).70

In the past decade, memetic algorithms (also called hybrid genetic algo-rithms) have been proposed to increase the search efficiency of population72

based optimization algorithms Hart et al. (2005). These methods are inspiredby models of adaptation in natural systems, and use a genetic algorithm for74

global exploration of the search space, combined with a local search heuristicfor exploitation. Memetic algorithms have shown to significantly speed up76

the evolution toward the global optimal solution for a variety of real-worldoptimization problems. However, our conjecture is that a search procedure78

that adaptively changes the way it generates offspring, based on the shapeand local peculiarities of the fitness landscape, will further improve the ef-80

ficiency of evolutionary search. This approach is likely to be productivebecause the nature of the fitness landscape (objective functions mapped out82

in the parameter space, also called the response surface) often varies consid-erably between different optimization problems, and dynamically changes en84

route to the global optimal solutions.

4

AM

ALG

AM

MA

NU

AL

Drawing inspiration from the field of ensemble weather forecasting (Gneit-86

ing and Raftery, 2005), we have presented a novel multimethod evolution-ary search approach (Vrugt and Robinson, 2007a; Vrugt et al., 2009a). The88

method combines two concepts, simultaneous multimethod search, and self-adaptive offspring creation, to ensure a fast, reliable, and computationally90

efficient solution to multiobjective optimization problems. We called thisapproach a multi-algorithm, genetically adaptive multiobjective, or AMAL-92

GAM, method, to evoke the image of a procedure that blends the attributesof the best available individual optimization algorithms. Benchmark results94

using a set of well known multiobjective test problems show that AMALGAMapproaches a factor of 10 improvement over current optimization algorithms96

for the more complex, higher dimensional problems (Vrugt and Robinson,2007a). AMALGAM scales well with increasing number of dimensions, con-98

verges in the close proximity of the global minimum for functions with noiseinduced multimodality, and is designed to take full advantage of the power100

of distributed computer networks (Vrugt et al., 2009a).In this paper, I introduce a MATLAB toolbox of the AMALGAM multi-102

criteria optimization algorithm. This MATLAB toolbox provides scientistsand engineers with a comprehensive set of utilities for application of the104

AMALGAM algorithm to multiobjective optimization. The AMALGAMtoolbox supports parallel computing and includes tools for convergence anal-106

ysis and post-processing of the results. Some of the built-in options and utili-ties are demonstrated using four different case studies involving (for instance)108

multimodality, numerous local optima, bounded parameter spaces, dynamicsimulation models, Bayesian model averaging, and distributed multi-core110

computation. These example studies are easy to run and adapt and serveas templates for other inference problems. The present contribution follows112

closely the DREAM toolbox Vrugt (2015) developed for Bayesian inferenceof complex system models.114

The remainder of this paper is organized as follows. Section 2 discussesthe various building blocks of AMALGAM. This is followed in section 3 with116

a pseudo-code of the algorithm. Section 4 then presents a MATLAB toolboxof AMALGAM. In this section we are especially concerned with the input118

and output arguments of AMALGAM and the various utilities and optionsavailable to the user. Section 5 considers four differen case studies that illus-120

trate how to use the AMALGAM toolbox. The penultimate section of thispaper (section 6) highlights recent research efforts aimed at further improv-122

ing the efficiency of multimethod multiple objective optimization. Finally,

5

AM

ALG

AM

MA

NU

AL

section 7 concludes the paper with a summary of the main findings.124

2. Inference of the Pareto distribution

A key task in multiple criteria optimization is to summarize the so called126

Pareto distribution. When this task cannot be carried out by analyticalmeans nor by analytical approximation, evolutionary algorithms can be used128

to generate a sample from the Pareto distribution. The desired summaryof the Pareto distribution is then obtained from this sample. The Pareto130

distribution, also referred to as the target or limiting distribution, is oftenhigh dimensional. A large number of iterative methods have been devel-132

oped to generate samples from the Pareto distribution. All these methodsrely in some way on evolutionary principles. The next section discusses the134

AMALGAM multimethod search algorithm.

2.1. AMALGAM: in words136

Whereas classical evolutionary algorithms employ a single recombinationmethod to create offspring, the AMALGAM method uses q different opti-138

mization algorithms concurrently to evolve a population of particles (alsocalled parents) through a multidimensional search space in pursuit of the140

Pareto distribution.The algorithm is initiated by an initial population X0 of size N×d, drawn142

randomly from some prior ranges using, for instance, Latin hypercube sam-pling. Then, each parent is assigned a rank using the fast nondominated144

sorting (FNS) algorithm of Deb et al. (2002) (see Figure 3). A population ofoffspring Y1, of size N×d, is subsequently created by using the multimethod146

search concept that lies at the heart of the AMALGAM method. Instead ofimplementing a single operator for reproduction, we simultaneously use q dif-148

ferent recombination methods to generate the offspring, Y1 = {y1, . . . ,yN}.These algorithms differ in their genetic operators used for reproduction, and150

the number of offspring they contribute to Y1 depends on their selectionprobability, p0 = {p1

0, . . . , pq0}, which varies per generation and depends on152

immediate past reproductive success. After the offspring has been created,the parents and children are combined, R1 = X0

⋃Y1 and the objective154

functions of this family population of size 2N × d are ranked using FNS.By comparing the current offspring with their previous generation, elitism156

is ensured because all previous nondominated members will be included inR (Zitzler and Thiele, 1999; Zitzler et al., 2000; Deb et al., 2002). Finally,158

6

AM

ALG

AM

MA

NU

AL

the N members of the next population X1 are chosen from subsequent non-dominated fronts of R1 based on their rank and crowding distance (Deb et160

al., 2002). The new population X1 is then used to create offspring and theaforementioned algorithmic steps are repeated until convergence is achieved.162

The core of the FNS algorithm can be coded in just a few lines (seeFigure 2) and requires as input argument the objective function values, Q164

(size 2N ×m) of the combined parent and offspring population.

function [ rank ] = FNS ( Q );% Fast nondominated sorting

[N,m] = size ( Q ); % Number of individuals and objective functionsFr = {[]}; % Pareto−optimal frontsrank = zeros(N,1); % Initialize vector with ranksSp = cell(N,1); % Set of individuals a particular individual dominatesnp = zeros(N,1); % Number of individuals by which a particular individual is dominated

%% Now loop over all elements of Rfor p = 1 : N,

idx = find( (sum( bsxfun(@le,Q(p,1:m),Q) , 2 ) == m ) & ...( sum( bsxfun(@lt,Q(p,1:m),Q) , 2 ) > 0 ) ); % Find which points of Q, p dominates

if numel(idx), Sp{p} = idx; end % Store all these points in Spidx = find( (sum(bsxfun(@le,Q,Q(p,1:m)) , 2) == m ) & ...

( sum(bsxfun(@lt,Q,Q(p,1:m)) , 2 ) > 0 ) ); % Find which points of Q dominate pnp(p) = np(p) + numel(idx); % Store number of points in np

if np(p) == 0, % p is member of first frontFr{1}(end+1) = p;

end

end

i = 1;while ~isempty(Fr{i})

NextFr = []; % Next front is emptyfor p = 1 : numel(Fr{i}), % For each member p of Fr{i}

q = Sp{Fr{i}(p)}; % Modify each member of the set Spnp(q) = np(q) − 1; % Decrement np(q) by oneidx = find ( np(q) == 0 ); NextFr = [NextFr ; q(idx)]; % If n(q) zero, q member of next front

endi = i + 1; % Go to next frontFr{i} = NextFr; % Current front members of NextFr

end

%% Now extract N−vector with ranks from cell structure Frfor j = 1:i, rank(Fr{j}) = j; end

Figure 2: MATLAB function of FNS algorithm. The matrix Q of size 2N × mwith objective function values is ranked. Built-in functions are highlighted witha low dash - their names are often sufficient to understand their mathematicaloperation. zeros() defines a zeroth vector (matrix), cell() creates a cell-arraywith all the solutions each point ofQ dominates, and bsxfun(@fun,A,B) applies anelement-by-element binary operation to arrays A and B, with singleton expansionenabled; The operator @fun is either less than, lt or less than or equal, le.We refer to introductory textbooks and/or the MATLAB "help" utility for theremaining functions size(), find(), sum(), numel(), and isempty().

7

AM

ALG

AM

MA

NU

AL

First, for each solution we calculate two entities: (i) np, the number166

of solutions which dominate the solution p, and (ii) Sp, a set of solutionswhich the solution p dominates. The calculation of these two entities requires168

O(mN2) operations (comparisons). We identify all those points which havenp = 0 and put them in a list Fr1. We call Fr1 the current front. Now, for170

each solution, p in the current front we visit each member (q) in its set Spand reduce its np count by one. In doing so, if for any member q the count172

becomes zero, we put it in a separate list NextFr. When all members of thecurrent front have been checked, we declare the members in the list Fr as174

members of the first front. We then continue this process using the newlyidentified front NextFr as our current front.176

To illustrate the FNS algorithm please consider Figure 3 that plots theranks of the different solutions (points) for four different multicriteria opti-178

mization problems.

0 0.25 0.5 0.75 10

0.25

0.5

0.75

1(A)

0 0.25 0.5 0.75 10

0.25

0.5

0.75

1(B)

0 0.25 0.5 0.75 10

0.25

0.5

0.75

1(C)

0 0.25 0.5 0.75 10

0.25

0.5

0.75

1(D)

rank 1

rank 2

rank 3

rank 4

rank 5

Figure 3: Two-dimensional scatter plots of the parent objective function valuesfor a (A) min-min, (B) min-max, (C) max-min, and (D) max-max optimizationproblem. Color coding is used to denote the Pareto ranks of the solutions.

8

AM

ALG

AM

MA

NU

AL

In the case of conventional Pareto ranking, points having an identical rank180

number are not distinguishable, even though the solutions at the extreme endare in some sense much more unique than other solutions having the same182

rank. Hence, another criteria is needed that penalizes individuals with manyneighbors in their niche. The crowding distance is such metric and preserves184

the diversity of the population, thereby avoiding a collapse of the Paretosolutions to the most compromised region of the solution space.186

The crowding distance provides an estimate of the density of solutionssurrounding each point and is computed using the following script (see Figure188

4). This function has two input arguments, Q the objective function valuesof the combined parent and offspring population, and rank a 2N × 1 vector190

with corresponding ranks derived from FNS.

function [ Crowding ] = Crowding_distance ( Q , rank );% Calculates the crowding distance of the solutions of Q

m = size(Q,2); % How many objectives?

for j = 1:max(rank),

idx = find(rank == j); % First find points with rank jR_sel = Q(idx,1:m); % Select those pointsN = numel(idx); % How many points with rank jC_d = zeros(N,1); % (Re)−initialize the crowding distance

for i = 1:m,[R_sort sort_idx] = sort(R_sel(:,i)); % Sort objective function in ascending orderC_d(sort_idx(1),1) = C_d(sort_idx(1),1) + inf; % Assign extreme value to boundary solutionsC_d(sort_idx(N),1) = C_d(sort_idx(N),1) + inf; % Assign extreme value to boundary solutions

for z = 2:(N − 1), % Now determine crowding distance of other individualsC_d(sort_idx(z),1) = C_d(sort_idx(z),1) + (R_sort(z+1) − R_sort(z−1));

end

end

Crowding(idx,1) = C_d; % Store crowding distance of solutions idx of Q

end

Figure 4: MATLAB function that calculates the crowding distance of the combinedparent and offspring population. This function has as input, the matrix Q of size2N×m with objective function values, and the 2N×1-vector rank with associatedranks. Built-in functions are highlighted with a low dash. max() computes themaximum value of a vector, sort() sorts the elements of a vector in ascendingorder, and inf stands for infinity.

Thus, to get an estimate of the density of solutions surrounding a particu-192

lar point in the population we take the average distance of the two points oneither side of this point along each of the objectives. This quantity, referred194

to as crowding distance serves as an estimate of the size of the largest cuboid

9

AM

ALG

AM

MA

NU

AL

enclosing the point i without including any other point in the population. In196

Figure 5, the crowding distance of the ith solution in its front (marked withsolid circles) is the average side-length of the cuboid (shown with a dashed198

box).

0 0.25 0.5 0.75 10

0.25

0.5

0.75

1

cuboid

i-1

i+1

i

Figure 5: Schematic example of com-putation of the crowding distance ofthe ith point of the population. Thered dots are Pareto solutions (rank1), whereas the blue solutions aredominated and hence belong to thenext front (rank = 2).

200

An alternative strategy to differentiate among solutions with equal rank202

is the strength Pareto approach of (Zitzler and Thiele, 1999) and used in theMOSCEM-UA algorithm (Vrugt et al., 2003), a predecessor of AMALGAM.204

This option is implemented in the MATLAB toolbox of AMALGAM, ofwhich more in the next section.206

function [ strength ] = Strength_pareto ( Q );% Compute strength of each solution according to Zitzler et al. (1999)

% How many objectives and individuals?[N,m] = size ( Q ) ;

% Initialize strength of each solutionn_dominate = zeros(N,1);

% Loop over individualsfor p = 1 : N,

% Which points does pth point dominate?n_dominate ( p , 1 ) = sum ( sum ( bsxfun(@lt,Q(p,1:m),Q) , 2 ) == m );

end

% Now calculate strength −−> the more points a solution dominates the lower% the strength of this solution −−> so we need to invert n_dominatestrength = N ./ n_dominate;

Figure 6: MATLAB function that calculates the strength Pareto of the combinedparent and offspring population. This function has as input, the matrix Q of size2N ×m with objective function values.

The strength Pareto approach was introduced in SPEA/SPEA-2 (Zitzler and

10

AM

ALG

AM

MA

NU

AL

Thiele, 1999; Zitzler et al., 2001) and details of how to compute the strength208

of each solution of the population can be found in the cited references. Inwords, the strength of a solution is equivalent to the reciprocal of the number210

of solutions it dominates. As solutions in the compromise region will havemany more points in their niche than the solutions at the extreme ends,212

selection based on rank and strength will ensure that unique individualscontinue to exist after each generation. This is a necessity for rapid evolution214

of the initial sample to the Pareto solution set. Benchmark experiments onknown target distributions suggest that the crowding distance operator is216

preferred, but because this might not always be the case (e.g. real worldapplications) the user is free to experiment with the density operator.218

Another strategy is to use differential weighting of the individual objectivefunctions - a methodology introduced by Zhang and Li (2007) in MOEA/D220

and which has been implemented in AMALGAM(D) (Vrugt, 2016). Thisapproach generates a perfectly uniform approximation of the Pareto front -222

and achieves particularly excellent performance on problems involving morethan m = 2 objectives.224

2.2. Multimethod adaptationNow we have discussed the computational heart of AMALGAM we are226

left with the question how to adapt the selection probability of each of therecombination methods. This adaptation method is designed specifically to228

favor recombination methods with higher reproductive success. We updatethe q-vector of selection probabilities, pt after each generation, t using the230

following equation

pjt+1 = (pj

t)−1

M jt∑q

j=1 Mjt

(3)

where Mt = {M1t , . . . ,M

qt } is a q-vector with the number of offspring each re-232

combination method, j = {1, . . . , q} contributes to Xt. The first term on theright hand side takes into consideration the number of offspring a given op-234

erator has contributed to the offspring population, whereas the second termbetween brackets normalizes the reproductive success of each algorithm. In236

the absence of prior information about the expected performance of individ-ual recombination method, we assume that they each create an equal number238

of offspring during the first generation.The number of offspring each algorithm has contributed to the new parent240

population thus serves a guiding principle to determine the selection proba-

11

AM

ALG

AM

MA

NU

AL

bility of each recombination method during the next generation. This ensures242

that the most productive genetic operators in AMALGAM are rewarded inthe next generation by allowing them to generate more offspring. To avoid244

inactivation of a particular recombination method during the course of theoptimization, the minimum selection probability is set larger than some nom-246

inal threshold, pmin (Vrugt and Robinson, 2007a). Prior information aboutthe expected performance of individual recombination methods is readily248

incorporated in AMALGAM by using a q-vector of different pmin values.

2.3. Recombination methods: Selection and implementation250

Now we have discussed the main algorithmic building blocks of AMAL-GAM what we are left with is the selection of recombination methods used252

to generate offspring. A host of different evolutionary algorithms could inprinciple be used, yet it would seem most productive to consider recombi-254

nation methods that complement each other and, by definition, thus exhibitdissimilar searching behavior. Here, we combine differential evolution (Storn256

and Price, 1997), particle swarm optimization (Kennedy et al., 2001), theadaptive Metropolis algorithm (Haario et al., 2001), and the NSGA-II ge-258

netic algorithm (Deb et al., 2002). Preliminary runs with a host of differentrecombination methods has shown that this group of recombination meth-260

ods provides adequate performance across a range of different benchmarkproblems (Vrugt and Robinson, 2007a). We now shortly discuss the imple-262

mentation of each of the four different recombination operators.

2.3.1. Differential Evolution264

While traditional evolutionary algorithms are well suited to solve manydifficult optimization problems, interactions among decision variables (pa-266

rameters) introduces another level of difficulty in the evolution. Previouswork has demonstrated the poor performance of a number of multiobjec-268

tive evolutionary optimization algorithms, including the NSGA-II, in findingPareto solutions for rotated problems exhibiting strong interdependencies270

between parameters (Deb et al., 2002). Rotated problems typically requirecorrelated, self-adapting mutation step sizes to make timely progress in the272

optimization.Differential evolution (DE) has been demonstrated to be able to cope274

with strong correlation among decision variables, and exhibits rotationallyinvariant behavior (Storn and Price, 1997). Unlike genetic algorithms and276

other evolutionary strategies, DE generates offspring using a fixed multiple

12

AM

ALG

AM

MA

NU

AL

of the difference between two or more randomly chosen parents of the popu-278

lation. We use variant DE/rand/1/bin of Storn and Price (1997) and createoffspring as follows280

yi,t = Xa,t−1 + β1(Xb,t−1 −Xa,t−1) + β2(Xc,t−1 −Xd,t−1), (4)

where a, b, c, d are selected without replacement from the integers {1, . . . , N}and β1 ∈ (0, 1] and β2 ∈ (0, 1] are control parameter that determine the282

diversity of the offspring.

2.3.2. Adaptive Metropolis sampler284

Evolutionary algorithms are prone to genetic drift in which the majorityof the population is inclined to converge toward a single solution, thereby286

relinquishing occupations in other parts of the search space. The adaptiveMetropolis sampler (AMS) is a Markov chain Monte Carlo (MCMC) sim-288

ulation method that actively prevents the search of becoming mired in therelatively small region of a single best solution by adopting an evolutionary290

strategy that allows replacing parents with offspring of lower fitness (Haarioet al., 2001; Vrugt et al., 2008a, 2009b; Laloy and Vrugt, 2012a). While this292

is a much appreciated strength, the AMS algorithm has another desirableproperty that is of more interest in the current study: the sampler is very294

efficient in sampling from high-dimensional target distributions. So, if ourmultimethod evolutionary optimization has progressed toward the Pareto-296

optimal front, then the AMS algorithm is able to rapidly explore the entirePareto distribution, successively visiting and generating a large number of298

solutions.The AMS recombination method creates offspring as follows300

yi,t = Xi,t−1 +Nd(0, γΣt−1), (5)

where Nd(a, b) is the d-variate normal distribution with mean a and covari-ance matrix, b, and γ is the jump rate which controls the spread (diversity)302

of the offspring. The covariance matrix is derived from the solutions of Xt−1with Pareto rank equivalent to one.304

2.3.3. Particle Swarm OptimizerParticle swarm optimization (PSO) is a population-based stochastic op-

timization method whose development was inspired from the flocking andswarm behavior of birds and insects. After its introduction by Kennedy et

13

AM

ALG

AM

MA

NU

AL

al. (2001), the method has gained rapid popularity in many fields of study.The method works with a group of potential solutions, called particles, andsearches for optimal solutions by continuously modifying this population insubsequent generations. To start, the particles are assigned a random lo-cation and velocity in the d-dimensional search space. After initialization,each particle iteratively adjusts its position according to its own flying expe-rience, and according to the flying experience of all other particles, makinguse of the best position encountered by itself, xbest

i and the entire population,Xbest. In contrast to standard genetic algorithms, PSO combines principlesfrom local and global search to evolve a population of points toward thePareto-optimal front. The reproductive operator for creating offspring froman existing population is (Kennedy et al., 2001)

vi,t = ϕvi,t−1 + c1r1(xbesti,t−1 −Xi,t−1) + c2r2(Xbest

t−1 −Xi,t−1)yi,t = Xi,t−1 + vi,t, (6)

where vi,t represents the new velocity of the ith member of Xt−1, ϕ is the306

inertia factor, c1 and c2 signify the cognitive and social factors of each particle,respectively, and r1 and r2 are drawn randomly from U [0, 1], a standard308

uniform distribution.To enhance search efficiency on multimodal problems we follow Parsopou-310

los and Vrahatis (2004) and perturb the offspringyi,t = (1 + ξ)yi,t (7)

with a turbulence factor, ξ ∼ U [−1, 1].312

2.3.4. Nondominated Sorted Genetic Algorithm, NSGA-IIThe NSGA-II algorithm developed by Deb et al. (2002) has received the314

most attention of all evolutionary optimization algorithms because of itssimplicity and demonstrated superiority over other existing methods. The316

algorithm uses well-known genetic operators of selection, crossover, and mu-tation to create a new population of points Yt from an existing population,318

Xt−1. We use simulated binary crossover (SBX) and polynomial mutation(Deb and Agrawal, 1995) to create offspring.320

3. AMALGAM : Pseudo-code

We now provide a pseudo-code of AMALGAM. The variable Φ(·|p) sig-322

nifies the discrete multinomial distribution on · with selection probabilityp = (p1, . . . , pq); pj ≥ 0 and ∑q

j=1 pj = 1.324

14

AM

ALG

AM

MA

NU

AL

1: Algorithm: AMALGAM2: Step 0: Problem formulation326

3: Define F(x) which returns m values, m ≥ 24: Define d, the dimensionality of x, d ≥ 1328

5: N ← population size, N ≥ 106: T ← maximum number of generations, T ≥ 2330

7: m← number of objectives, m ≥ 28: Define X , the domain ranges, x ∈ X ∈ Rd

332

9: Step 1: Set main algorithmic variables10: rec_methods ← define constituent search methods334

11: pcr ← NSGA crossover probability, pcr ∈ [0, 1]12: pm ← NSGA mutation probability, pm ∈ [0, 1]336

13: ηC ← NSGA crossover distribution index, ηC ∈ [1, 250]14: ηM ← NSGA mutation distribution index, ηM ∈ [1, 250]338

15: γ ← AMS jumprate, γ ≥ 016: β1 ← DE scaling factor, β1 ∈ [0, 2]340

17: β2 ← DE scaling factor, β2 ∈ [0, 2]18: c1 ← PSO social factor, c1 ∈ [1, 2]342

19: c2 ← PSO cognitive factor, c2 ∈ [1, 2]20: ϕ← PSO inertia factor, ϕ ∈ [0, 1]344

21: K ← thinning rate, K ≥ 122: pmin ←minimum selection probability of q recombination methods, pmin ∈346

[0, 1/q]23: Step 2: Initialization348

24: Define p0 = {p10, . . . , p

q0}

25: Step 3: At generation t = 0350

26: for i = 1, . . . , N do27: Sample X0 ∈ X using initial sampling distribution352

28: Evaluate model (function), Hi = F(Xi)29: end for354

30: Step 4: At generation 1 ≤ t ≤ T31: for i = 1, . . . , N do356

32: Sample j ∼ Φ({1, . . . , q}|pt−1)33: Create offspring, Yi,t from Xt−1 using the jth recombination method358

34: Repair infeasible offspring (or not)35: Evaluate model (function), Gi,t = F(Yi,t)360

36: end for37: Step 5: Update population362

15

AM

ALG

AM

MA

NU

AL

38: Combine parents and offspring, Rt = Xt−1⋃Yt

39: Calculate rank and crowding distance of Rt364

40: Store in Xt the N members of Rt based on rank and crowding distance41: Step 6: Search adaptation366

42: Calculate pt based on reproductive success (Equation 3)43: Step 7: Convergence check368

44: If criteria satisfied, stop, otherwise go back to Step 4The pseudo code of AMALGAM assumes serial evaluation of the N dif-370

ferent children of the offspring population. Fortunately, AMALGAM is em-barrassingly parallel and each of the N offspring is easily evaluated on a372

different core, thereby permitting inference of CPU-intensive forward mod-els. The software package of AMALGAM that I describe in the next section374

includes an option for multi-core evaluation of the offspring population usingthe MATLAB parallel computing toolbox.376

4. AMALGAM: MATLAB implementation

The basic code of AMALGAM was written in 2006 but many new func-378

tionalities and options have been added to the source code in recent years tosupport the needs of users. The AMALGAM code can be executed from the380

MATLAB prompt by the command

[X,F,output,Z] = AMALGAM(Func_name,AMALGAMPar,Par_info)382

where Func_name (string), AMALGAMPar (structure array), and Par_info (struc-ture array) are input arguments defined by the user, and X (matrix), F (ma-384

trix), output (structure array) and Z (matrix) are output variables computedby AMALGAM and returned to the user. An additional output argument386

sim contains the simulations of the forward model, and is available if theuser activates memory storage of the model output, details will be discussed388

later. To minimize the number of input and output arguments in the AMAL-GAM function call and related primary and secondary functions called by390

this program, we use MATLAB structure arrays and group related variablesin one main element using data containers called fields, more of which later.392

Two optional input arguments that the user can pass to AMALGAM areFpareto and options and their content and usage will be discussed below.394

The AMALGAM function uses more than twenty other functions tohelp evolve the initial population to the Pareto distribution. All these func-396

tions are summarized briefly in Appendix A. In the subsequent sections I will

16

AM

ALG

AM

MA

NU

AL

discuss the MATLAB implementation of AMALGAM. This, along with pro-398

totype case studies presented herein and template examples listed in runA-MALGAM should help users apply multi-criteria inference to their own400

functions, numerical models and data.

4.1. Input argument 1: Func_Name402

The variable Func_Name defines the name (enclosed in quotes) of theMATLAB function (.m file) used to calculate the objective functions of each404

parameter vector, x. The use of a m-file rather than anonymous function(e.g. pdf), permits AMALGAM to solve multi-criteria inference problems406

involving, for example, dynamic simulation models. If we conveniently as-sume Func_name to be equivalent to ’model’ then the call to this function408

becomesY = model(X) (8)410

where X (input argument) is a d×N matrix of parameter vectors (column-wise), and Y is a return argument of sizem×N with objective function values.412

The content of the function model needs to be written by the user - the syntaxdefinition is universal. Appendix B provides four different templates of the414

function model which are used in the case studies presented in section 5.

4.2. Input argument 2: AMALGAMPar416

The structure AMALGAMPar defines the computational settings of AMAL-GAM. Table 1 lists the different fields of AMALGAMPar, their default values,418

and the corresponding variable names used in the mathematical descriptionof AMALGAM in section 3.3.420

17

AM

ALG

AM

MA

NU

AL

Table 1: Main algorithmic variables of AMALGAM: Mathematical symbols, cor-responding fields of AMALGAMPar and default settings.

Symbol Description Field AMALGAMPar DefaultProblem dependent

d number of parameters dN population size N > 10T number of generations Tm number of objective functions m > 1

recombination methods rec_methods {′GA′,′ PSO′,′ AMS′,′ DE′}pcr NSGA crossover probability p_cr 0.9pm NSGA mutation probability p_m 1/dηC NSGA crossover distribution index eta_C 10ηM NSGA mutation distribution index eta_M 50γ AMS jump rate gamma (2.38/

√d)2

β1 DE scaling factor: variant-DE-I beta_1 Ud(0.6, 1.0)β2 DE scaling factor: variant-DE-II beta_2 Ud(0.2, 0.6)c1 PSO social factor c_1 1.5c2 PSO cognitive factor c_2 1.5ϕ PSO inertia factor varphi U(0.5, 1.0)K thinning rate K 1pmin minimum selection probability p_min 0.05

The names of the different fields of AMALGAMPar are equivalent to thesymbols (letters) used in the (mathematical) description of AMALGAM.422

The values of the fields d, N, T and m depend on the dimensionality ofthe Pareto distribution, and hence should be defined by the user. Default424

settings are assumed for the remaining fields of AMALGAMPar. These defaultsettings are easily modified by the user, by simply specifying individual fields426

of AMALGAMPar explicitly and their respective values.The field thinning of AMALGAMPar allows the user to specify the thinning428

rate of the historical archive Z to reduce memory requirements for high-dimensional Pareto distributions involving large parameter dimensionalities430

and many different generations to converge adequately. For instance, for ad = 100 dimensional Pareto distribution of m = 2 objectives with N = 100432

and T = 1, 000, MATLAB would need a staggering 10.2-million bytes ofmemory to store all the T populations and their corresponding objective434

function values. Thinning stores only every Kth population. This optionreduces memory storage with a factor of T/K, and also decreases the au-436

tocorrelation between parents of successively stored populations. A default

18

AM

ALG

AM

MA

NU

AL

value of K = 1 (no thinning) is assumed in AMALGAM. Note, large values438

for K (K » 10) can be rather wasteful because many parents are not used inthe computation of the Pareto moments and/or plotting of marginal/bivari-440

ate parameter distributions.

4.3. Input argument 3: Par_info442

The structure Par_info stores all necessary information about the pa-rameters of the target distribution, for instance their prior uncertainty ranges444

(for bounded search problems), initial values (how to draw initial population),prior distribution (if prior information is available) and boundary handling446

(what to do if out of feasible space), respectively. Table 2 lists the differentfields of Par_info and summarizes their content, default values and variable448

types.

Table 2: AMALGAM input argument Par_info: Different fields, description,options, default settings and variable types.

Field Par_info Description Options Default Typeinitial Initial sample uniform/latin/normal/prior stringmin Minimum values -Infd 1× d-vectormax Maximum values Infd 1× d-vectorboundary Boundary handling reflect/bound/fold/none ’none’ stringmu Mean ’normal’ 1× d-vectorcov Cov. ’normal’ d× d-matrixprior Prior distribution cell array †

† Data type with containers called cells. Each cell can contain any type of data.

The field initial of Par_info specifies with a string enclosed between450

quotes how to sample the initial state of each of the N different chains.The user can select from (1) ’uniform’ random, (2) ’latin’ hypercube, (3)452

multivariate ’normal’, and (4) sampling from a user-defined ’prior’ distribu-tion. Option (1) and (2) require specification of the fields min and max of454

Par_info that define with 1 × d vectors the lower and upper bound valuesof each of the parameters, respectively. Option (3) ’normal’ necessitates def-456

inition of fields mu (1× d -vector) and cov (d× d-matrix) of Par_info whichstore the mean and covariance matrix of the multivariate normal distribu-458

tion, respectively. Finally, for ’prior’ (option 4) the user needs to specify as

19

AM

ALG

AM

MA

NU

AL

cell array field prior of Par_info, for example460

Par_info.prior = {’normrnd(-2,0.1)’,’trnd(10)’,’unifrnd(-2,4)’} (9)

uses a normal distribution with mean of -2 and standard deviation of -0.1 for462

the first parameter, a Student distribution with ν = 10 degrees of freedomfor the second dimension, and a uniform distribution between -2 and 4 for464

the last parameter of the target distribution, respectively. The first threeoptions assume the prior distribution to be noninformative (uniform/flat),466

and consequently do not favor any parameter values prior to assimilation ofthe data. On the contrary, if an explicit ’prior’ distribution is used then the468

samples of each dimension will follow the marginal distribution of prior.The fields min and max of the structure Par_info serve two purposes.470

First, they define the feasible parameter space from which the initial state ofeach of the chains is drawn if ’uniform’ random or ’latin’ hypercube sampling472

is used. Second, they can define a bounded search domain for problemsinvolving one or more parameters with known physical/conceptual ranges.474

This does however require the bound to be actively enforced during chainevolution. Indeed, offspring generated with the recombination methods can476

fall outside the hypercube defined by min and max even if the initial stateof each chain are well within the feasible search space. The field boundary of478

Par_info provides several options what to do if the parameters are outsidetheir respective ranges. The four different options that are available are (1)480

’bound’, (2) ’reflect’, (3) ’fold’, and (4) ’none’ (default). These methods areillustrated graphically in Figure 7 and act on one parameter at a time.482

x 1

x 2

‘bound’ ‘reflect’ ‘fold’

x 2

x 2

x 1 x 1

Figure 7: Different options for parameter boundary handling in the AMALGAMpackage, a) set to bound, b) reflection, and c) folding.

20

AM

ALG

AM

MA

NU

AL

The option ’bound’ is most simplistic and simply resets each dimensionof the d-vector of parameters that is outside the bound equal to its respective484

bound. The option ’reflect’ is somewhat more refined and views the boundaryof the search space as a mirror through which individual dimensions are486

reflected back into the parameter space. The size of this reflection is set equalto the "amount" of boundary violation. The ’bound’ and ’reflect’ boundary488

treatment options are used often in the field of optimization concerned withfinding the global optimum of a given cost or objective function. The third490

option ’fold’ connects the upper and lower bound of the parameter space so asto create a continuum representation. Note, this approach can lead to "bad"492

children if at least one of the dimensions of the Pareto distribution is locatedat the edges of the search domain. For those dimensions the parameter values494

can transition directly from low to high values, and vice versa.Practical experience suggests that a reflection approach works well in496

practice. The option ’bound’ is least recommended as it can collapse theparameter values of a given dimension to a single point, thereby not only498

loosing diversity for sampling but also inflating the density of solutions atthe bound. Most evolutionary algorithms apply the ’bound’ option. For500

some test functions, with optimum for at least some of the dimensions on thebound this can artificially enhance the convergence rate to the nondominated502

solution set.

4.4. (Optional) input argument 4: Fpareto504

The fourth input argument Fpareto of the AMALGAM function is op-tional and useful for case studies involving benchmark functions with known506

Pareto solution set. The matrix Fpareto is of size n×m and stores n differentPareto solutions (in objective space). This input argument allows AMAL-508

GAM to compute convergence metrics such as the hypervolume (While et al.,2006; Beume et al., 2009) and widely used inverse generational distance. The510

inverted generation distance, or IGD-metric Veldhuizen (1998); Li and Zhang(2009) is used to measure the distance of the solutions, A = {a1, . . . , aN} to512

a reference set, P = {p1, . . . ,pn} of n uniformly distributed Pareto solutions,

IGD(A,P) = 1n

n∑l=1

min1≤i≤N

||pl − ai||, (10)

where || · || denotes the Euclidean distance. If the reference set is sufficient514

large, say n > 250, the IGD-metric measures not only the distance of the

21

AM

ALG

AM

MA

NU

AL

sampled solutions to the true Pareto front, but also their spread (diversity).516

The IGD values are strictly positive. The closer its value to zero, the betterthe approximation of the true Pareto front.518

4.5. (Optional) input argument 5: optionsThe structure options is optional and passed as fifth input argument to520

AMALGAM. The fields of this structure can activate (among others) filewriting, workspace saving, storage of model output, and distributed multi-522

core calculation. Table 3 summarizes the different fields of options and theirdefault settings.524

Table 3: Content of (optional) input structure options. This fifth input argumentof AMALGAM is required to activate built-in functionalities such as distributedmulti-processor calculation, workspace saving, and file writing.

Field of options Description Options Defaultparallel Distributed multi-core calculation? no/yes ’no’IO If parallel, IO writing of model? no/yes ’no’modout Store output of model? no/yes ’no’save Save AMALGAM workspace? no/yes ’no’restart Restart run? (’save’ required) no/yes ’no’density Approach to compute density points? crowding/strength ’crowding’

Multi-core calculation takes advantage of the MATLAB Parallel Com-puting Toolbox and evaluates the N different children created with AMAL-526

GAM on a different processor. This is especially useful if the forward model,model is CPU-demanding and requires at least a few seconds (often more) to528

run. The field IO of options determines the setup of the distributed com-puting environment. If file writing is used to communicate between AMAL-530

GAM and an external executable called from model using the built-in dosor unix command, then each processor is setup automatically to work in532

a different directory. This is a simple solution to file overwriting and cor-ruption that occurs if multiple different children are evaluated in the same534

directory. If model consists of MATLAB code only with/without shared li-braries linked through the built-in MEX-compiler then the parameter values,536

x can be passed directly to this .mex or .dll function and a single commondirectory for all workers suffices.538

22

AM

ALG

AM

MA

NU

AL

For CPU-intensive forward models it is convenient to not only store theparameter samples but also keep in memory the corresponding model simu-540

lations returned by model and used to calculate the likelihood of each pro-posal. This avoids having to rerun the model script again after AMALGAM542

has terminated to assess Pareto simulation uncertainty. The field modoutof options determines whether to store the output of the model script. If544

model output writing is activated then the N simulations of each popula-tion X are stored in memory after each generation. These simulations are546

then returned to the user as fifth output argument, sim of AMALGAM.If thinning is activated then this applies to the simulations stored in sim as548

well.To help evaluate the progress of AMALGAM, it can be useful to peri-550

odically store the MATLAB workspace to a file. If the field save of optionsis set to ’yes’ then the workspace is saved to the file ’AMALGAM.mat’ after552

each successive iteration. This temporary file is necessary input for a restartrun (field restart) if convergence has not been achieved with the assumed554

computational budget.Finally, the field density of options determines which method is used to556

calculate the density of each solution of the population. The default optionis the crowding distance operator, explicated in Figure 5 and implemented558

in NSGA-II (Deb et al., 2002). As alternative to this, the user can selectthe strength Pareto approach introduced in the SPEA/SPEA-2 algorithms560

(Zitzler and Thiele, 1999; Zitzler et al., 2001). The MATLAB script of ofFigure 6 details how to compute the Pareto strength of each solution of562

the population. Practical experience suggests that the crowding distanceoperator is preferred and leads to the best approximation of the Pareto front564

- yet, a synergy of both methods might receive the best performance - anidea that deserves further investigation.566

4.6. Output argumentsWe now briefly discuss the three output (return) arguments of AMAL-568

GAM including X, F, output and Z. These four variables summarize theresults of the AMALGAM algorithm and are used for plotting of the Pareto570

front, analysis of the nondominated solution set, convergence assessment,and investigation of the selection probabilities of the individual recombina-572

tion methods.The variable X is a matrix of size N × d with the parameter vectors (as574

rows) of the final population. The corresponding objective function values

23

AM

ALG

AM

MA

NU

AL

are stored in the N ×m matrix F.576

The following MATLAB command

plot(F(1:AMALGAMPar.N,1),F(1:AMALGAMPar.N,2),’r+’) (11)578

plots the objective function space of the N members of the final population.If the analysis involves three-different objective functions, then the statement580

plot3(F(:,1),F(:,2),F(:,3),’r+’) (12)

can be used to create a three-dimensional plot of the sampled objective func-582

tion values of the final population.The structure output contains important (diagnostic) information about584

the progress of the AMALGAM algorithm. The field RunTime (scalar) storesthe wall-time (seconds), p_alg (matrix) the selection probability of each of the586

recombination methods, and IGD (matrix) the value of the IGD-convergencediagnostic, after each successive generation.588

The MATLAB command

plot(output.IGD(1:end,1),output.IGD(1:end,2),’b’) (13)590

generates a trace plot of the IGD-convergence diagnostic.The matrix Z (NT × d + m) stores the population and their corre-592

sponding objective function values after each successive generation. If thin-ning is applied then the number of rows of Z is reduced and equivalent to594

NT/K + 1;K ≥ 2.Finally, a fifth output argument, the matrix sim is available if the user596

has activated the option modout of options. This output argument canbe used to plot the simulations of each of the Pareto solutions - and hance598

serves to illustrates the Pareto simulation uncertainty. The matrix sim isonly available for dynamic simulation models involving the simulation of600

some spatial/temporal entity. A second output argument of model is thenrequired - as will be illustrated in case study 3 of the next section.602

The directory ’../postprocessing’ (under main directory) contains a num-ber of different functions that can be used to visualize the different output604

arguments of AMALGAM. The script postproc_AMALGAM can be executedfrom the MATLAB prompt after the main AMALGAM function has ter-606

minated. Appendix A summarizes briefly the graphical output of the post-processing scripts.608

24

AM

ALG

AM

MA

NU

AL

5. Numerical examples

I now demonstrate the application of the MATLAB AMALGAM pack-610

age to four different multi-criteria optimization problems. These case studiescover a diverse set of problem features and involve (among others) multi-612

modality, local optima, and high-dimensional Pareto distributions.

5.1. Case Study I: ZDT1614

We revisit the two-objective scalable test function suite of Zitzler et al.(2000) and consider ZDT1 which has a convex Pareto-optimal front and is616

given by

arg minx∈X

F(x) ={f1(x) = x1

f2(x) = g(x)h(x),(14)

where618

g(x) = 1 + 9d− 1

d∑i=2

xi , h(x) = 1−√

x1

g(x) , (15)

and xi ∈ [0, 1]. The Pareto optimal front is formed with g(x) = 1. Equation14 is solved numerically in the script ZDT1 of Appendix B.620

We derive the Pareto distribution of ZDT1 assuming d = 30 using thefollowing problem setup in the AMALGAM toolbox.622

%% Problem settings defined by userAMALGAMPar.N = 100; % Define population sizeAMALGAMPar.T = 100; % How many generations?AMALGAMPar.d = 30; % How many parameters?AMALGAMPar.m = 2; % How many objective functions?

%% Initial sampling and parameter rangesPar_info.initial = 'latin'; % Latin hypercube samplingPar_info.boundhandling = 'bound'; % Explicit boundary handling: reflectionPar_info.min = zeros(1,AMALGAMPar.d); % Minimum values of each parameterPar_info.max = ones(1,AMALGAMPar.d); % Maximum values of each parameter

%% Define name of functionFunc_name = 'ZDT1';

% Now load Pareto front −− this is a benchmark problemFpareto = load('ZDT1.txt');

% Run the AMALGAM code and obtain non−dominated solution set[ X , F , output , Z ] = AMALGAM ( AMALGAMPar , Func_name , Par_info , Fpareto );

% Now postprocess the results and create figurespostprocAMALGAM ( AMALGAMPar , Par_info , X , F , output , Fpareto );

Figure 8: Case study I: Test function ZDT1.

The initial sample is drawn using Latin hypercube sampling, and bound-ary handling is activated to enforce the parameters to stay within their prior624

25

AM

ALG

AM

MA

NU

AL

ranges. The asc-ii file ’ZDT1.txt’ contains an equidistant sample of the knownPareto front, and is used by AMALGAM to compute the evolution of the626

IGD convergence diagnostic.Figure 9 (left hand side) plots the objective function values of the final628

population (red crosses) after 100 generations. The true Pareto optimal frontis indicated with the solid black line. The plot at the right hand side displays630

the evolution of the IGD convergence metric.

0 0.2 0.4 0.6 0.8 10

0.4

0.8

1.2

1.6

2

True front

AMALGAM

0 20 40 60 80 10010

-3

10-2

10-1

Number of generations

IGD

(-)

(B)

(A)

Figure 9: (A) True (black line) and AMALGAM sampled (red crosses) Paretooptimal front. (B) Trace plot of the IGD convergence metric - the closer thevalue of this diagnostic to zero the better the population approximates the truenondominated solution set.

The samples generated with AMALGAM closely approximate the true632

Pareto optimal front. About 2, 000 ZDT1 function evaluations are requiredto converge within 0.01 of the known Pareto front.634

5.2. Case Study II: ZDT4Our second case study involves test function ZDT4 of Zitzler et al. (2000).636

This problem involves 219 local Pareto optimal fronts, and is therefore astrong test for AMALGAM’s ability to deal with multimodality. The ZDT4638

test function is given by

arg minx∈X

F(x) ={f1(x) = x1

f2(x) = g(x)h(x),(16)

26

AM

ALG

AM

MA

NU

AL

where640

g(x) = 1 + 10(d− 1) +d∑

i=2

(x2

i − 10 cos(4πxi))

, h(x) = 1−√

x1

g(x) , (17)

x1 ∈ [0, 1] and x2, . . . , xm ∈ [−5, 5]. The global Pareto-optimal front isformed with g(x) = 1, the best local Pareto-optimal front with g(x) = 1.25.642

Note that not all local Pareto-optimal sets are distinguishable in the objectivespace. Equation 16 is solved numerically in script ZDT4 of Appendix B.644

We derive the Pareto distribution of ZDT4 for a d = 10 dimensionaldecision variable space using the following problem setup in the AMALGAM646

toolbox.%% Problem settings defined by userAMALGAMPar.N = 100; % Define population sizeAMALGAMPar.T = 100; % How many generations?AMALGAMPar.d = 30; % How many parameters?AMALGAMPar.m = 2; % How many objective functions?

%% Initial sampling and parameter rangesPar_info.initial = 'latin'; % Latin hypercube samplingPar_info.boundhandling = 'bound'; % Explicit boundary handling: reflectionPar_info.min = [0 −5*ones(1,AMALGAMPar.d−1)]; % Minimum values of each parameterPar_info.max = [1 5*ones(1,AMALGAMPar.d−1)]; % Maximum values of each parameter

%% Define name of functionFunc_name = 'ZDT4';

% Now load Pareto front −− this is a benchmark problemFpareto = load('ZDT4.txt');

% Run the AMALGAM code and obtain non−dominated solution set[ X , F , output , Z ] = AMALGAM ( AMALGAMPar , Func_name , Par_info , Fpareto );

% Now postprocess the results and create figurespostprocAMALGAM ( AMALGAMPar , Par_info , X , F , output , Fpareto );

Figure 10: Case study II: Test function ZDT4.

The initial sample is drawn using Latin hypercube sampling, and bound-648

ary handling is used to enforce the parameters to stay within their respectiveranges. The known Pareto optimal front, stored in the file ’ZDT4.txt’, is650

used by AMALGAM to calculate the value of the IGD statistic after eachsuccessive generation.652

To demonstrate the advantages of multimethod optimization, considerFigure 11, which plots the nondominated fronts generated with the individ-654

ual NSGA-II (squares), PSO (circles), AMS (plusses) and DE (diamonds)recombination methods, and AMALGAM (red crosses) after 2, 500, 5, 000656

and 7, 500 function evaluations. The true Pareto distribution is separatelyindicated in each graph with a black line. After only 7,500 function eval-658

uations, AMALGAM has progressed toward the true Pareto-optimal front,

27

AM

ALG

AM

MA

NU

AL

and has generated solutions that are far more evenly distributed along the660

Pareto front than any of the individual algorithms.

0 0.2 0.4 0.6 0.8 110

-4

10-3

10-2

10-1

100

101

102

103

104

ZDT4: 500 Function evaluations

0 0.2 0.4 0.6 0.8 110

-4

10-3

10-2

10-1

100

101

102

103

104

ZDT4: 2,500 Function evaluations

0 0.2 0.4 0.6 0.8 110

-4

10-3

10-2

10-1

100

101

102

103

104

ZDT4: 5,000 Function evaluations

True front

GA

PSO

DE

AMS

AMALGAM

(A) (C)(B)

Figure 11: Pareto-optimal fronts after 5, 25, and 50 generations with NSGA-II(blue squares), PSO (green circles), AMS (yellow crosses), DE (cyan plusses), andAMALGAM (red crosses) optimization algorithms for test problem ZDT4. Thisbenchmark problem has 219 different local Pareto-optimal fronts in the searchspace, of which only one corresponds to the global Pareto-optimal front. The truePareto optimal front is separately indicated with the solid black line. Combiningthe individual algorithms into a adaptive multimethod search algorithm ensures afaster and more reliable solution to multiobjective optimization problems.

Figure 12 presents a trace plot of the selection probabilities of the recom-662

bination methods used by AMALGAM. Initially, the NSGA-II algorithm(squares) exhibits the highest reproductive success, yet after about 20 gener-664

ations, the DE (diamonds), AMS (plusses), and PSO (circles) algorithms aresuddenly more favored. This combination of methods proves to be extremely666

effective at increasing the diversity of solutions along the Pareto front oncethe NSGA-II method does its job. The performance of AMALGAM on the668

other benchmark problems provides further justification for this conclusion.

28

AM

ALG

AM

MA

NU

AL

0 25 50 75 1000

0.2

0.4

0.6

0.8

1S

ele

ctio

n p

rob

ab

ility

(-)

GA

PSO

AMS

DE

0 25 50 75 10010

-4

10-2

100

102

Number of generations

IGD

(-)

(A)

(B)

Figure 12: Illustration of the concept of self-adaptation in multimethod evolu-tionary optimization. (A) Evolution of the selection probability of each individualrecombination method used in AMALGAM. (B) Trace plot of the IGD convergencediagnostic. These results illustrate the utility of individual search algorithms dur-ing different stages of the optimization, and provide numerical evidence for the’No Free Lunch’ theorem of Wolpert and Macready (1997).

5.3. Case Study III: hmodel conceptual watershed model670

We now consider a real world problem involving a relatively simple con-ceptual watershed model (Schoups and Vrugt, 2010). The model transforms672

rainfall into runoff at the watershed outlet using explicit process descriptionsof interception, throughfall, evaporation, runoff generation, percolation, and674

surface and subsurface routing (see Figure 13).

29

AM

ALG

AM

MA

NU

AL

dt

dSI Q = QS + QF

QF

QS

Qrunoff

Qperc

P EI Eu

Pe Imax

dSu Su,max dt

dSF

dt

dSS

dt

Original model

Figure 13: Schematic representation of the hmodel conceptual watershed model.

Runoff generation is assumed to be dominated by saturated overland676

flow and is simulated as a function of basin water storage without an ex-plicit dependence on rainfall intensity. This assumption is typically valid for678

temperate climates but may be violated in semiarid watersheds. Snow accu-mulation and snowmelt are also not accounted for, yet this is not a problem680

if we focus on "warm" watersheds. Table 4 summarizes the seven differentmodel parameters, and their prior uncertainty ranges.682

Table 4: Model parameters and their prior uncertainty ranges.

Parameter Symbol Minimum Maximum Units

Maximum interception Imax 1 10 mmSoil water storage capacity Smax 10 1000 mmMaximum percolation rate Qmax 0.1 100 mm/dEvaporation parameter αE 0.1 100 -

Runoff parameter αF -10 10 -Time constant, fast reservoir KF 0.1 10 daysTime constant, slow reservoir KS 0.1 150 days

We now like to estimate the parameters of the model. We use a seven-year record with daily data of discharge (mm/day), mean areal precipitation684

(mm/day), and mean areal potential evapotranspiration (mm/day) from awatershed in the USA (from MOPEX data set). Details of the basin, ex-686

perimental data, and model can be found in various publications. We usea two-year spin-up period to reduce sensitivity of the model to state-value688

initialization. In other words, only the last five years are used for our analysis.

30

AM

ALG

AM

MA

NU

AL

We would like the hmodel to fit both the driven and nondriven part of690

the hydrograph equally well. Unfortunately, practical experience with a suiteof different hydrologic models suggests that this is practically impossible.692

Due to epistemic errors (model structural errors) and rainfall data errors,the model is unable to accurately fit both parts of the hydrograph. Single694

objective optimization (for instance minimization of sum of squared error)would give a compromise model calibration and hence simulation. We like to696

understand the trade-off in fitting both parts of the hydrograph as it mighthelp us understand the errors in our model structure.698

To better understand how these objective functions are computed pleaseconsider Figure 14 that schematically illustrates how the observed hydro-700

graph (discharge data) is partitioned in a driven (solid circles) and nondriven(open circles) part. This partitioning follows ideas presented in (Boyle et al.,702

2000).Rain

Time (days)

5 10 15 20

5

0

10

15

mm

/day

Figure 14: Partitioning of the ob-served hydrograph into driven (blue)and nondriven (red) part. Classifi-cation is based on the rainfall datarecord. Days with precipitation de-fine the driven part of the hydro-graph, whereas dry days constitutethe nondriven part.

704

The following setup is used in the MATLAB package of AMALGAM.

31

AM

ALG

AM

MA

NU

AL

%% Problem settings defined by userAMALGAMPar.N = 100; % Define population sizeAMALGAMPar.T = 100; % How many generations?AMALGAMPar.d = 7; % How many parameters?AMALGAMPar.m = 2; % How many objective functions?

%% Initial sampling and parameter rangesPar_info.initial = 'latin'; % Latin hypercube samplingPar_info.boundhandling = 'reflect'; % Explicit boundary handling: reflectionPar_info.min = [1 10 0.1 0.1 −10 0.1 0.1]; % Minimum values of each parameterPar_info.max = [10 1000 100 100 10 10 150]; % Maximum values of each parameter

%% Define name of functionFunc_name = 'hmodel';

%% Define additional optionsoptions.parallel = 'yes'; % Multi−core evalation of each of the childrenoptions.IO = 'no'; % No input/output writing: Model is linked to MATLAB as mex functionoptions.modout = 'yes'; % Return simulations ( horizontal vectors )

% Run the AMALGAM code and obtain non−dominated solution set[ X , F , output , Z , sim ] = AMALGAM ( Func_name , AMALGAMPar , Par_info , Fpareto , options );

% Now postprocess the results and create figurespostprocAMALGAM ( AMALGAMPar , Par_info , X , F , output , Fpareto , sim );

Figure 15: Case study III: Modeling of the rainfall-runoff transformation.

The initial sample is drawn using Latin hypercube sampling, and a re-706

flection step is used if the parameter values are outside their respective priorranges specified in the structure Par_info. Distributed multi-core comput-708

ing is used to evaluate the N children and calculate their objective functionvalues. The simulations of the hmodel of each population are stored af-710

ter each generation and used in the post-processor script for plotting of thePareto simulation uncertainty ranges (of which more later). The model is712

defined in the script hmodel of Appendix B, and returns the values of the twoobjective functions for each of the parameter vectors of matrix X using the714

MOPEX data record as input. The actual model crr_model is written in theC-language and linked to MATLAB into a shared library called a MEX-file.716

The use of such MEX function significantly reduces the CPU-time requiredto approximate the Pareto distribution with AMALGAM.718

Figure 16 presents the Pareto uncertainty for each parameter of thehmodel. The individual parameters are listed along the x-axis, while the720

y-axis defines their normalized ranges using the data tabulated in Table 1.Each line across the graph represents one Pareto solution from the final pop-722

ulation sampled with AMALGAM. The solid and dashed black lines goingfrom left to right across the plots correspond to the single-objective solu-724

tions of fD and fND obtained by separately fitting to each criterion using theSCE-UA global optimization algorithm (Duan et al., 1992). The right-hand726

32

AM

ALG

AM

MA

NU

AL

side in Figure 16 plots the objective function values of the final population.The black crosses denote the single criterion ends derived from the SCE-UA728

algorithm, and are used to benchmark the AMALGAM results.

0

0.2

0.4

0.6

0.8

1

(A)

0.6 0.65 0.7 0.75 0.80.34

0.38

0.42

0.46

0.5(B)

Figure 16: Normalized parameter plots for each of the hmodel parameters usinga two-criteria {fD, fND} calibration. Each line across the graph denotes a singleparameter set: gray is Pareto solution set; red and blue lines are single-criterionsolutions of fD and fND, respectively. The squared plot at the right-hand side area two-dimensional projection of the objective space of the Pareto set of solutions.The red and blue cross signify the single criterion solutions derived separately withthe SCE-UA optimization algorithm (Duan et al., 1992). The Pareto solution setencompasses the SCE-UA solutions - this inspires thrust that AMALGAM hasconverged adequately as the nondominated solution set includes the extreme ends.

AMALGAM has generated a fairly uniform approximation of the Pareto730

front with solutions that occupy the single-criterion solutions at the extremeends. For most of the hmodel parameters, the Pareto solution set tends to732

cluster closely in the parameter space for the two objectives. However, thereis considerable uncertainty associated with the recession parameters KFand734

KS in the hmodel, which play a major role in determining the shape of thehydrograph during recession periods.736

To better understand how the Pareto parameter distribution translatesinto hydrograph simulation uncertainty, please consider Figure 17 that plots738

the Pareto uncertainty intervals (gray interval) of the hmodel for a selectedportion of the calibration data set.740

33

AM

ALG

AM

MA

NU

AL

500 600 700 800 900 1,000 1,100 1,200 1,3000

5

10

15

20

25

30

Number in days (days)

Dis

ch

arg

e (

mm

/da

ys)

Figure 17: Time series plot of hmodel Pareto simulation uncertainty (grey lines)and the observed data (magenta dots). The hmodel tracks the data quite nicelybut systematic deviations are visible, for instance in the nondriven part of thehydrograph around days 980-1,000 and 1,020 - 1,050. This demonstrates that themodel is in need of further refinement.

The observed data are indicated with the magenta dots. The streamflowprediction uncertainty ranges match the medium- and high-flow events very742

well, but do not bracket the observations and display bias (systematic error)on the long recessions, suggesting that the model structure may be in need744

of further improvement. The relatively large uncertainty found during lowflow and recession periods is consistent with the relatively large uncertainty746

in the Smax and KS parameters. The issue of model structural errors isbest addressed using diagnostic model evaluation, and interested readers are748

referred to (Vrugt and Sadegh, 2013).

5.4. Case Study IV: Bayesian Model Averaging750

Ensemble Bayesian Model Averaging (BMA) proposed by Raftery et al.(2005) is a widely used method for statistical post-processing of forecasts752

from an ensemble of different models. The BMA predictive distribution ofany future quantity of interest is a weighted average of probability density754

functions centered on the bias-corrected forecasts from a set of individualmodels. The weights are the estimated posterior model probabilities, repre-756

senting each model’s relative forecast skill in the training (calibration) period.Successful application of BMA requires estimates of the weights and vari-758

ances of the individual competing models in the ensemble. In their seminal

34

AM

ALG

AM

MA

NU

AL

paper, Raftery et al. (2005) recommends using the Expectation Maximiza-760

tion (EM) algorithm (Dempster et al., 1997). This method is relatively easyto implement, computationally efficient, but does not provide uncertainty762

estimates of the weights and variances. Vrugt et al. (2008b) therefore intro-duced Bayesian inference of the BMA weights and variances using Markov764

chain Monte Carlo (MCMC) simulation with the DREAM algorithm (Vrugtet al., 2008a, 2009b, 2011; Laloy and Vrugt, 2012a). Yet, typical applica-766

tions of BMA calibrate the forecast specific density functions by optimiz-ing a single measure of predictive skill. Therefore, Vrugt et al. (2006) pro-768

posed a multi-criteria formulation for postprocessing of forecast ensembles.This multi-criteria framework implements different diagnostic measures to770

reflect different but complementary metrics of forecast skill, and uses theAMALGAM optimization algorithm to solve for the Pareto set of param-772

eters that have consistently good performance across multiple performancemetrics. Theory, concepts and applications of AMALGAM(BMA) have been774

presented by Vrugt et al. (2006) and interested readers are referred to thispublication for further details.776

Here we demonstrate the application of AMALGAM to BMAmodel train-ing using a 36-year record of daily streamflow observations from the Leaf778

River basin in the USA. If we denote with D = {d1, . . . , dn} the n-vector ofobserved discharge values, and store in a n× L matrix D the corresponding780

simulations, Dl = {dl1, . . . , d

ln} of an ensemble of L = 8 different watershed

models, l = {1, . . . , L} then our multicriteria framework implements the fol-782

lowing three metrics

arg minx∈X

F(x) =

f1(x) =√√√√( 1

n

n∑t=1

(dBMAt (w)− dt)2

)

f2(x) = − 1n

n∑t=1

log{

L∑l=1

wlgl(dt|dlt)}

f3(x) = 1n

n∑t=1

∞∫−∞

(Gt(y)− 1{y ≥ dt}

)2dy

,(18)

where {f1, f2, f3} measure the root mean square error (RMSE, m3/s), neg-784

ative log-likelihood (NLL, -) and continuous rank probability score (CRPS,-) of the BMA model forecast, respectively. The L-vector w = {w1, . . . , wL}786

stores the BMA weights of the L models, dBMAt (w) = ∑L

l=1 wldlt denotes the

35

AM

ALG

AM

MA

NU

AL

mean forecast of the BMA model, and G(y) and g(y) represent the cumu-788

lative distribution function (cdf) and probability density function (pdf) ofthe BMA predictive pdf. The Heaviside step function, 1{y ≥ dt} attains790

the value of one if the statement between accolades is true, and zero other-wise. The three metrics measure simultaneously the qualify of fit, sharpness792

and spread of the BMA model. Smaller values are preferred for each of themetrics.794

The following script defines the setup of multicriteria BMA calibration inthe MATLAB package of AMALGAM.796

%% Problem settings defined by userAMALGAMPar.N = 100; % Define population sizeAMALGAMPar.T = 100; % How many generations?AMALGAMPar.m = 3; % How many objective functions?

%% Initial sampling and parameter rangesPar_info.initial = 'latin'; % Latin hypercube samplingPar_info.boundhandling = 'reflect'; % Explicit boundary handling: reflection

%% Define name of functionFunc_name = 'BMA_calc';

%% Define BMA as a global variableglobal BMA

load data.txt; % Daily streamflow simulations eight watershed modelsload Y.txt; % Daily streamflow observationsStartT = 1; EndT = 3000; % Start/End day training periodBMA.PDF = 'gamma'; % pdf predictor: normal/heteroscedastic/gammaBMA.VAR = 'single'; % variance pdf: single/multiple (multiple for 'normal')

%% Setup the BMA model (apply linear bias correction)[AMALGAMPar,BMA,Par_info] = setup_BMA(AMALGAMPar,Par_info,BMA,data,Y,StartT,EndT);

%% Define additional optionsoptions.save = 'yes'; % Save workspace to file during execution of AMALGAMoptions.ranking = 'C'; % Use C source code for ranking and crowding distance calculation

% Run the AMALGAM code and obtain non−dominated solution set[ X , F , output , Z ] = AMALGAM ( AMALGAMPar , Func_name , Par_info , [] , options );

Figure 18: Case study V: Bayesian model averaging using AMALGAM.

The predictive distribution of each constituent member of the ensemble isassumed to follow a gamma distribution with unknown heteroscedastic vari-798

ance - details of which are found in the function setup_BMA. The workspaceis saved after each iteration, and the FNS algorithm is ran in C through a800

mex-compiled function to speed up the calculation of the rank and crowdingdistance. The BMA_calc function calculates the objective function values,802

{ RMSE , NLL , CRPS } for each BMA parameter vector and is listed inAppendix B.804

Table 5 summarizes the results of AMALGAM and presents (in column"Gamma") the mean Pareto (MP) values of the BMA weights for the different806

36

AM

ALG

AM

MA

NU

AL

models of the ensemble. Values listed in parentheses denote the standarddeviation derived from the nondominated solutions in the final population of808

AMALGAM. We also summarize the MP values of the weights for a Gaussian(conditional) distribution (columns "Normal") with homoscedastic (left) or810

heteroscedastic (right) error variance, and report the lowest values of theRMSE (m3/s), NLL (-), and CRPS (-) in the Pareto distribution of the BMA812

model during the n = 3000-day calibration data period. For completeness,the RMSE (m3/s) values of the individual models are listed as well.814

Table 5: Results of AMALGAM(BMA) by application to eight different watershedmodels using daily discharge data from the Leaf River in Mississippi, USA. Welist the individual forecast errors of the models for the training data period, thecorresponding MP values of the weights for a Gamma (default) and Gaussianforecast distribution, and the minimum values of the RMSE, negative log-likelihood(NLL), and continuous rank probability score (CRPS).

Model RMSE Gamma Normal † Normal ‡

ABC 31.67 0.00 (0.000) 0.00 (0.000) 0.00 (0.000)GR4J 19.21 0.13 (0.012) 0.13 (0.011) 0.13 (0.010)HYMOD 19.03 0.00 (0.000) 0.00 (0.009) 0.00 (0.000)TOPMO 17.68 0.24 (0.043) 0.25 (0.028) 0.24 (0.031)AWBM 26.31 0.00 (0.003) 0.00 (0.010) 0.00 (0.000)NAM 20.22 0.00 (0.000) 0.00 (0.011) 0.00 (0.008)HBV 19.44 0.07 (0.052) 0.06 (0.040) 0.08 (0.044)SACSMA 16.45 0.56 (0.010) 0.56 (0.014) 0.56 (0.013)BMA: RMSE 15.64 15.59 15.59BMA: NLL 3.16 3.39 3.07BMA: CRPS 6.78 6.78 6.78

† Homoscedastic (fixed) variance.‡ Heteroscedastic variance.

The values of the weights depend somewhat on the assumed conditionaldistribution of the deterministic model forecasts of the ensemble. The GR4J,816

HBV and SACSMA models consistently receive the highest weights and arethus most important in BMA model construction for this data set. Note818

also that TOPMO receives a very low BMA weight, despite it having thesecond lowest RMSE value of the training data period. Correlation between820

the individual forecasts of the watershed models affects strongly the poste-rior distribution of the BMA weights. The differences between the different822

conditional distributions appear rather marginal.

37

AM

ALG

AM

MA

NU

AL

To better understand the multi-criteria calibration results, please con-824

sider Figure 19 that presents normalized parameter plots of the Pareto BMAweights of the individual models of the ensemble (left-hand side), and a826

three-dimensional plot of the objective function values of the final popula-tion sampled with AMALGAM (right-hand side).828

0

0.2

0.4

0.6

0.8

1

(A)

15.415.6

15.816 3

4

56.7

6.8

6.9

7

NLL (-)

RMSE (m3/s)

(B)

CR

PS

(-)

(B)

Figure 19: Normalized parameter plots for the BMA weights of the Gamma condi-tional distribution using a three-criterion { RMSE, NLL, CRPS } calibration withAMALGAM for the 3000-day training data set of daily discharge values simulatedwith eight watershed models. Each line across the graph denotes a single Paretosolution. The plots at the right-hand side are three-dimensional projections of theobjective space of the Pareto set of solutions. The solutions of the single criterionends are indicated with a red (RMSE), blue (NLL) and green (CRPS) line (A)and cross (B), respectively.

Finally, we now do a similar analysis but using 48-h forecasts of surfacetemperature and sea level pressure in the North American Pacific Northwest830

in January-June 2000 from the University of Washington (UW) mesoscaleshort-range ensemble system (Grimit and Mass, 2002). This is a five mem-832

ber multianalysis ensemble (hereafter referred to as the UW ensemble) con-sisting of different runs of the fifth-generation Pennsylvania State University834

- National Center for Atmospheric Research Mesoscale Model (MM5), inwhich initial conditions are taken from different operational centers. Fol-836

lowing Raftery et al. (2005) a 25-day training period between April 16 and9 June 2000 is used for BMA model calibration. For some days the data838

were missing, so that the number of calendar days spanned by the trainingdata set is larger than the number of days of training used. The individuals840

members of the ensemble were bias corrected using simple linear regression

38

AM

ALG

AM

MA

NU

AL

of Dl on D for the training data set. We assume a Gaussian conditional842

distribution, g(·), with a fixed (homoscedastic) variance for each member ofthe ensemble.844

Figure 20 presents normalized parameter plots of the results for the threecriterion BMA model calibration with the AMALGAM method for the (A)846

surface temperature and (B) sea level pressure data set. Each line goingfrom left to right across the plot corresponds to a different parameter com-848

bination. The gray lines represent members of the Pareto set (appears as aband). The six BMA model parameters are listed along the xâĂŞaxis, and850

the yâĂŞaxis corresponds to the parameter values, normalized by their prioruncertainty ranges. The two plots at the right-hand side in Figure 20 depict852

three-dimensional projections of the trade-off surface of the three diagnosticperformance measures for both data sets. The Pareto rank 1 solutions in854

these plots are indicated with the gray dots.

39

AM

ALG

AM

MA

NU

AL

0

0.2

0.4

0.6

0.8

1

(A)

1.66

1.68

1.70

2,0004,000

1.198

1.2

1.202

1.204

1.206

(B)

RMSE (0K)

NLL (-)

CR

PS

(-)

0

0.2

0.4

0.6

0.8

1

(B)

1.99

1.995

20

5001,000

1.589

1.59

1.591

1.592

1.593

1.594

(B)

RMSE (0K)NLL (-)

CR

PS

(-)

Figure 20: Normalized parameter plots for the BMA weights and variance of theGaussian conditional distribution using a three-criterion { RMSE, NLL, CRPS }calibration with AMALGAM for the UW 48-hour ensemble data sets of (A) surfacetemperature and (B) sea level pressure. Each line across the graph denotes a singleparameter set: shaded is Pareto solution set. The plots at the right-hand side arethree-dimensional projections of the objective space of the Pareto set of solutions.

The Pareto solution space spans only a very small region interior to the856

prior defined plausible parameter space for both data sets. This illustratesthat the BMAmodel is well defined by calibration against the three individual858

performance criteria. This is further confirmed by a check of the three-dimensional projections of the RMSE, NLL, CRPS space of the Pareto set of860

solutions at the right hand side. Both these plots show very small trade-off inthe fitting of the RMSE and CRPS performance measures, and suggest that862

it is possible to identify a single BMA model that has good and consistentperformance for each of the individual performance metrics. The user is, of864

course, free to select any other Pareto BMA model - based on subjectivepreferences and the intended goal of application.866

In general we can conclude that the multi-criteria optimization helps toguide the search for an appropriate BMA model, and provides useful infor-868

40

AM

ALG

AM

MA

NU

AL

mation about the trade-offs between the various performance metrics. Thecurrent practice of optimizing the BMA model using maximum likelihood870

theory seems to result in a calibrated forecast ensemble that receives goodperformance in terms of quadratic forecast error, and sharpness of the pre-872

diction intervals. However, such consistent performance of the ML methodcannot be guaranteed for all forecasting problems. We refer interested read-874

ers to Vrugt et al. (2006), Vrugt and Robinson (2007a) and Rings et al.(2012) for a more comprehensive analysis of the BMA approach, including a876

comparison with filtering methods.

6. Final remarks878

The four case studies presented herein illustrate only some of the maincapabilities of the AMALGAM software package. Yet, not all the options,880

discussed initially in the presentation of the toolbox have been demonstratedexplicitly. The script runAMALGAM presents an exhaustive overview of882

the different options and capabilities of the AMALGAM software suite, andincludes eight examples involving inference of much more complex and higher884

dimensional target distributions as well. Users can draw from the differenttest problems and use them as templates for their own modeling problems.886

Recent work includes implementation of a decomposition-based searchstrategy in AMALGAM. This methodology, called AMALGAM(D) uses the888

Tchebycheff approach (Zhang and Li, 2007; Li and Zhang, 2009) to calculatethe fitness, q(·) of each individual, Xi of the population,890

q(Xi|wi, z) = max1≤j≤m

{wi,j|fi,j(Xi)− zj|}

subject to xi ∈ X ∈ Rd(19)

where d signifies the dimensionality of the variable (parameter) space, wi ={wi,1, . . . , wi,m} denotes the weight vector, z = {z1, . . . , zm} is an anchor892

point. The Pareto front can be approximated by minimizing (19) simultane-ously for each of the N individuals using evenly distributed weight vectors,894

{w1, . . . ,wN};wi,j ∈ [0, 1];∑mj=1 wi,j = 1. Yet, this approach is not partic-

ularly efficient if each individual scalar subproblem is solved independently,896

without information exchange between the successive solutions evolved inparallel. We therefore follow Zhang and Li (2007) and Li and Zhang (2009)898

and exploit similarities from neighboring individuals with closest weight fac-tors (in Euclidean space) to guide the selection and evolution process and900

speed-up convergence towards the Pareto front.

41

AM

ALG

AM

MA

NU

AL

7. Summary902

In this paper I have introduced a MATLAB package of the AMALGAMmulti-criteria optimization algorithm. This toolbox provides scientists and904

engineers with an arsenal of options and utilities to solve optimization prob-lems involving (amongst others) two or more objectives. The AMALGAM906

toolbox supports parallel computing and includes tools for convergence anal-ysis of the sampled solutions and post-processing of the results. Four different908

case studies were used to illustrate the main capabilities and functionalitiesof the MATLAB toolbox. These example studies are easy to run and adapt910

and serve as templates for other inference problems.A graphical user interface (GUI) of AMALGAM is currently under de-912

velopment and will become available in due course.

8. Acknowledgements914

The MATLAB toolbox of AMALGAM is available upon request from thefirst author, [email protected]

42

AM

ALG

AM

MA

NU

AL

9. Appendix A

Table 6 summarizes, in alphabetic order, the different function/program918

files of the AMALGAM package in MATLAB. The main program runA-MALGAM contains seven prototype studies which cover a large range of920

problem features. These example studies have been published in some ofour papers and provide a template for users to setup their own case study.922

The last line of each example study involves a function call to AMALGAM,which uses all the other functions listed below to generate samples of the924

Pareto distribution. Each example problem of runAMALGAM has itsown directory which stores the model script written by the user and all other926

files (data file(s), MATLAB scripts, external executable(s), etc.) necessaryto compute the objective functions of each parameter vector.928

The directory ’../postprocessing’ contains a number of different functionsdesigned to visualize the results (output arguments) of DREAM. The pro-930

gram postproc_AMALGAM creates a large number of MATLAB fig-ures, including one or more scatter plots (two- or three-dimensional) of the932

objective function values of the nondominated solution set, histograms ofthe marginal distribution of the Pareto parameter samples, scatter plots of934

the Pareto samples, and trace plots of the IGD convergence diagnostic (ifFpareto defined by user) and the selection probability of each recombina-936

tion method.

43

AM

ALG

AM

MA

NU

AL

Table6:

Descriptio

nof

theMAT

LAB

func

tions

andscrip

ts(.m

files)used

byAMALG

AM,v

ersio

n1.4.

Nam

eof

functio

nDescriptio

nA

MA

LGA

MMainAMALG

AM

func

tionthat

callthediffe

rent

func

tions

listedhe

rean

dreturnsPa

reto

solutio

nset

AM

ALG

AM

_ca

lc_

setu

pSe

tupof

compu

tatio

nalh

eart

ofAMALG

AM

and(if

activ

ated

)thedistrib

uted

compu

tingenvironm

ent

AM

ALG

AM

_ch

eck

Che

ckstheAMALG

AM

setupforpo

tentiale

rrorsan

d/or

inconsist

encies

inthesettings

defin

edby

theuser

AM

ALG

AM

_en

dTe

rminates

compu

tingenvironm

entan

dcreatesreturn

argu

ments

AM

ALG

AM

_in

itia

lize

Pre-allocatesthemainmatric

esan

dvectorsused

byAMALG

AM

anddraw

sinitial

popu

latio

nA

MA

LGA

M_

setu

pSe

tupof

themainalgorit

hmic

varia

bles

andop

tions

used

byAMALG

AM

Bou

ndar

y_ha

ndli

ngCorrectionof

parameter

values

ofoff

sprin

gthat

areou

tsidethefeasible

search

space(if

defin

ed)

Cre

ate_

chil

dren

Functio

nthat

uses

diffe

rent

recombina

tionmetho

dsto

create

thechild

popu

latio

nC

row

ding

_di

stan

ceCalcu

latesthecrow

ding

distan

ceof

mem

berof

thepo

pulatio

nE

valu

ate_

mod

elEv

alua

testheoff

sprin

gpo

pulatio

nan

dcompu

tesob

jectivefunc

tionvalues

(execu

tesscrip

tFu

nc_n

ame)

FN

SFa

stno

ndom

inated

sortingof

matrix

(pop

ulation)

ofob

jectivefunc

tionvalues

Gen

_ne

w_

popu

lati

onCreates

new

popu

latio

nby

compa

ringrank

andcrow

ding

distan

ceof

parentsan

dtheiroff

sprin

gLa

tin

Latin

hype

rcub

esampling

Load

_ca

lcul

atio

nUpd

ates

theselectionprob

ability

ofeach

recombina

tionmetho

dLo

ad_

dist

ribu

tion

Determines

which

parentsto

beused

with

each

recombina

tionmetho

dR

anki

ng_

CReturns

Pareto

rank

andcrow

ding

distan

ceof

popu

latio

nusingmex

compiledsource

code

inC

lang

uage

Ran

king

_M

AT

LAB

Returns

Pareto

rank

andcrow

ding

distan

ceof

popu

latio

nusingMAT

LAB

code

Rec

_m

etho

d_A

MS

Prod

uces

offsprin

gfrom

selected

parentsusingad

aptiv

eMetropo

lisalgorit

hmR

ec_

met

hod_

DE

Creates

offsprin

gfrom

selected

parentsusingdiffe

rentiale

volutio

nR

ec_

met

hod_

NSG

APr

oduc

esoff

sprin

gfrom

selected

parentsusingNSG

A-IIalgorit

hmR

ec_

met

hod_

PSO

Creates

offsprin

gfrom

selected

parentsusingpa

rticle

swarm

optim

izer

Stre

ngth

_pa

reto

Calcu

latesthePa

reto

streng

thof

each

individu

alof

popu

latio

n(e.g.(Z

itzle

ran

dT

hiel

e,1999))

Upd

ate_

PSO

Upd

atebe

stkn

ownpo

sitionof

each

particle

andoverallb

estpo

sition

44

AM

ALG

AM

MA

NU

AL

10. Appendix B938

This Appendix presents the different model functions used in the four casestudies presented in section 5 of this paper. These functions (.m file) serve as940

a template for users to help define their own forward model in AMALGAM.All model functions have as input argument, X a matrix of size d×N with N942

parameter vectors, and as output a matrix of size m×N which contains theobjective function values for each of the N parameter vectors. A low-dash944

is used in the print out of each model scripts to denote the use of a built-inMATLAB function.946

10.1. Case Study I: ZDT1The MATLAB function ZDT1 listed below uses common built-in operators948

to calculate the pair of objective functions of test problem 1 described in(Zitzler and Thiele, 1999) for a given vector, x of size d × 1 of parameter950

values.952

function F = ZDT1 ( x )% ZDT1 − test function954

% Retain variables in memory after first call956persistent d

958% Only execute this part onceif isempty(d),960

[d] = size( x , 1 ); % Extract dimensionality of parameter spaceend962

f = x(1); % Calculate f964g = 1 + 9/(d−1) * sum(x(2:d)); % Calculate gh = 1 − sqrt(f./g); % Calculate h966F(1) = f; F(2) = g .* h; % Objective functions968

10.2. Case Study II: ZDT4The MATLAB function ZDT1 listed below uses common built-in operators970

to calculate the pair of objective functions of test problem 1 described in(Zitzler and Thiele, 1999) for a given vector, x of size d × 1 of parameter972

values.974

function F = ZDT4 ( x )% ZDT4 − test function976

% Retain variables in memory after first call978persistent d

980% Only execute this part onceif isempty(d),982

[d] = size( X , 1 ); % Extract dimensionality of parameter spaceend984

f = x(1); % Calculate f986g = 1 + 10*(d−1) + sum(x(2:d).^2 − 10*cos(4*pi*x(2:d))); % Calculate gh = 1 − sqrt(f./g); % Calculate h988F(1) = f; F(2) = g .* h; % Objective functions990

45

AM

ALG

AM

MA

NU

AL

10.3. Case Study III: hmodel conceptual watershed modelThe MATLAB function hmodel listed below simulates the rainfall-runoff992

transformation for a vector, x of size d× 1 of parameter values and returnstwo output variables; a pair of RMSE based objective function values for994

the driven and nondriven part of the hydrograph, and the correspondingsimulation of the hmodel. This second output argument is required as the996

option modout of option has been activated in the main script (see Figure 19.The two objective function values are computed from the simulated discharge998

time series using

f1(x) =√√√√ 1nD

∑j∈D

(yj(x)− yj)2 , f2(x) =√√√√ 1nND

∑j∈ND

(yj(x)− yj)2, (20)

where Y = {y1, . . . , yn} and Y = {y1(x), . . . , yn(x)} denote the observed1000

and simulated discharge time series, respectively, and nD (nND) signifies thenumber of driven (nondriven) discharge observations.1002

function [ F , Y_sim ] = hmodel ( x )1004% Executes HMODEL conceptual watershed model and returns objective functions

1006% Retain variables in memory after first callpersistent d mopexdata tout data options Y_obs y0 idx_d idx_nd1008

% Only do this part during first call1010if isempty(mopexdata),

1012%% Calculate size of parameter vector[d] = size ( x );1014

%% Load the data1016mopexdata = load('03451500.dly'); % load the mopex dataidx = find(mopexdata(:,1)>1959 & mopexdata(:,1)<1999); % find right data1018n = size(mopexdata,1); tout = 0:n; % create toutY_obs = mopexdata(idx(1:n),6); % observed discharge (mm/day)1020

%% Create data structure1022data.P = mopexdata(idx(1:n),4)'; % daily rainfall (mm/day)data.Ep = mopexdata(idx(1:n),5)'; % daily evaporation (mm/day)1024data.aS = 1e−6; % percolation coefficient

1026%% Create structure options with ODE settings and define initial statesoptions.InitialStep = 1; % initial time−step (day)1028options.MaxStep = 1; % maximum time−step (day)options.MinStep = 1e−6; % minimum time−step (day)1030options.RelTol = 1e−3; % relative toleranceoptions.AbsTol = 1e−3 * ones(5,1); % absolute tolerances (mm)1032options.Order = 2; % 2nd order accurate method (Heun)y0 = 1e−6 * ones(5,1); % Initial states of reservoirs (mm)1034

%% Now define calibration data and index of driven and nondriven part hydrograph1036Y_obs = Y_obs(731:end)'; % First two years are spin upidx_d = find(data.P(731:end) > 0); % Index of driven part1038idx_nd = find(data.P(731:end) == 0); % Index of nondriven part

1040end

1042%% Now run hmodel for each parameter combination, and return objective functionsy = crr_model(x(1:d,1),tout,data,options,y0); % Run hmodel using external C code1044Y = y(5,2:end) − y(5,1:end−1); % Calculate discharge (mm/day)Y_sim = Y(731:end); % First two years are spin up1046F(1) = sqrt ( sum ( ( Y_sim(idx_d) − Y_obs(idx_d) ).^2)/numel(idx_d) ); % SSE driven partF(2) = sqrt ( sum ( ( Y_sim(idx_nd) − Y_obs(idx_nd) ).^2)/numel(idx_nd) ); % SSE nondriven part1048

46

AM

ALG

AM

MA

NU

AL

The source code of the hmodel is written in C and linked into a shared1050

library using the MEX-compiler of MATLAB. This avoids file writing, andallows for direct passing of the parameter values, forcing conditions, and1052

settings of the numerical solver to crr_model. A second-order time-variableintegration method is used to solve the differential equations of the hmodel.1054

10.4. Case Study IV: Bayesian Model AveragingThe MATLAB function BMA_calc returns a pair of three objective func-1056

tion values of the BMA model for a given d×1 vector, x of parameter valuesconsisting of BMA weights and variances.1058

function [ F ] = BMA_calc ( X );1060% This function calculates the log likelihood corresponding to the weights and sigma's

1062global BMA % Request the BMA structurepersistent d n % Retain variables in memory after first call1064

% Only execute this part once1066if isempty(d), [d] = size(x,1); n = numel(BMA.Ycal); end

1068L = 0; % Set likelihood equivalent to zerow = x(1:BMA.k); % Unpack weights1070

switch lower(BMA.PDF) % Now check which BMA model is used1072case {'normal'} % Normal distribution with homoscedastic error variance

if strcmp(lower(BMA.VAR),'single'); % One or multiple variances?1074sigma = x(BMA.k+1) * ones(1,BMA.k);

elseif strcmp(lower(BMA.VAR),'multiple');1076sigma = x(BMA.k+1:end);

else1078error('do not know this option for variance treatment')

end1080for i = 1:BMA.k, % Mixture model

L = L + w(i)*exp(−1/2*((BMA.Ycal−BMA.Xcal(:,i))./sigma(i)).^2)./ ...1082(sqrt(2*pi).*sigma(i)); % Now calculate likelihood

end1084case {'heteroscedastic'} % Normal distribution with heteroscedastic error variance

b = x(BMA.k+1); % Unpack variance parameter1086for i = 1:BMA.k, % Mixture model

sigma = abs(b*BMA.Xcal(:,i)); % Calculate measurement error of data1088L = L + w(i)*exp(−1/2*((BMA.Ycal−BMA.Xcal(:,i))./sigma).^2)./ ...

(sqrt(2*pi).*sigma); % Calculate likelihood1090end

case {'gamma'} % Gamma distribution1092b = x(BMA.k+1:2*BMA.k); c = x(2*BMA.k+1); % Unpack variables gamma distributionfor i = 1:BMA.k, % Mixture model1094

mu = abs(BMA.Xcal(:,i)); % Derive mean of gamma distributionvar = abs(c + b(i) * BMA.Xcal(:,i)); % Derive variance of gamma distribution1096A = mu.^2./var; B = var./mu; % Derive A and B of gamma distributionz = BMA.Ycal./B; % Compute help variable1098u = (A − 1).*log(z) − z − gammaln(A); % Compute help variableL = L + w(i) * (exp(u)./B); % Calculate likelihood1100

endend1102L(L==0) = 1e−300; % Replace zero likelihood with 1e−300BMA_mean = BMA.Xcal*w; % Now calclate BMA mean forecast1104F(1) = sqrt ( sum ( ( BMA_mean − BMA.Ycal ).^2 ) / n ); % RMSEF(2) = −sum(log(L)) / n; % NLL1106F(3) = CRPS([ BMA_mean BMA_mean],BMA.Ycal); % CRPS1108

In the example considered herein, the conditional pdf of each ensemble mem-ber is assumed to follow a gamma distribution with homoscedastic variance.1110

47

AM

ALG

AM

MA

NU

AL

11. References

D. Achlioptas, A. Naor, and Y. Peres, "Rigorous location of phase transitions1112

in hard optimization problems," Nature, vol. 435, pp. 759-763, 2005.

J. Barhen, V. Protopopescu, and D. Reister, "TRUST: A deterministic algo-1114

rithm for global optimization," Science, 276, pp. 1094-1097, 1997.

N. Beume, C.M. Fonseca, M. López-Ibáñez, L. Paquete, and J. Vahren-1116

hold, "On the complexity of computing the hypervolume indicator," IEEETransactions on Evolutionary Computation, vol. 13, no. 5, pp. 1075-1082,1118

doi:10.1109/TEVC.2009.2015575, 2009.

D.G. Bounds, "New optimization methods from physics and biology," Nature,1120

vol. 329, pp. 215-219, 1987.

D.P. Boyle, H.V. Gupta, and S. Sorooshian, "Toward improved calibration1122

of hydrologic models: Combining the strengths of manual and automaticmethods," Water Resources Research, vol. 36, no. 12, 3663-3674, 2000.1124

K. Deb, and R.B. Agrawal, "Simulated binary crossover for continuous searchspace," Complex Systems, vol. 9, pp. 115-148, 1995.1126

K. Deb, "Multi-objective optimization using evolutionary algorithms,", Wiley,New York, 2001.1128

K. Deb, L. Thiele, M. Laumanns, and E. Zitzler "Scalable test problemsfor evolutionary multiobjective optimization," Evolutionary multiobjective1130

optimization, pp. 105-145, 2005.

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, "A fast and elitist multi-1132

objective genetic algorithm," IEEE Transactions on Evolutionary Compu-tation, vol. 6, pp. 182-197.1134

A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum likelihood fromincomplete data via the EM algorithm," Journal of the Royal Statistical1136

Society, vol. 39(B), pp. 1-39, 1977.

Q. Duan, V.K. Gupta, and S. Sorooshian, "Effective and efficient global1138

optimization for conceptual rainfall-runoff models," Water Resources Re-sources, vol. 28, no. 4, pp. 1015-1031, 1992.1140

48

AM

ALG

AM

MA

NU

AL

M. Glick, A. Rayan, and A. Goldblum, "A stochastic algorithm for global op-timization and for best populations: a test case of side chains in proteins,"1142

Proceedings of the National Academy of Sciences of the United States ofAmerica, vol. 99, pp. 703-708.1144

T. Gneiting, and A.E Raftery, "Weather forecasting with ensemble methods,"Science, vol. 310, pp. 248-249, 2005.1146

D.E. Goldberg, "Genetic algorithms in search, optimization and machinelearning," Addison-Wesley, Reading, M, 1989.1148

E.P. Grimit, and C.F. Mass, "Initial results of a mesoscale shortrange ensem-ble forecasting system over the Pacific Northwest", Weather Forecasting,1150

vol. 17, pp. 192-205, 2002.

H. Haario, E. Saksman, and J. Tamminen, "An adaptive Metropolis algo-1152

rithm," Bernoulli, vol. 7, pp. 223-242, 2001.

W.E. Hart, N. Krasnogor, and J.E. Smith, "Recent advances in memetic1154

algorithms," Springer, Berlin, 2005.

J. Holland, "Adaptation in Natural and Artificial Systems," MIT Press, Cam-1156

bridge, MA, 1975.

J. Kennedy, R.C. Eberhart, and Y. Shi, "Swarm Intelligence, " Morgan Kauf-1158

mann, San Francisco, 2001.

J. Knowles, and D. Corne, "The pareto archived evolution strategy: A new1160

baseline algorithm for pareto multiobjective optimisation," Proceedings ofthe 1999 Congress on Evolutionary Computation, IEEE Press, New York,1162

1999.

E. Laloy, and J.A. Vrugt, "High-dimensional posterior exploration1164

of hydrologic models using multiple-try DREAM(ZS) and high-performance computing," Water Resources Research, vol. 48, W01526,1166

doi:10.1029/2011WR010608, 2012a.

A.R. Lemmon, and M.C. Milinkovitch, "The metapopulation genetic algo-1168

rithm: an efficient solution for the problem of large phylogeny estimation,"Proceedings of the National Academy of Sciences of the United States of1170

America, vol. 99, pp. 10516-10521, 2002.

49

AM

ALG

AM

MA

NU

AL

H. Li and Q. Zhang, "Multiobjective optimization problems with complicated1172

Pareto sets, MOEA/D and NSGA-II," IEEE Transactions on EvolutionaryComputation, vol. 13, no. 2, pp. 284-302, 2009.1174

M.A. Nowak, and K. Sigmund, "Evolutionary dynamics of biological games,"Science, vol. 303, pp. 793-799, 2004.1176

K.E. Parsopoulos, and M.N. Vrahatis, "On the computation of all globalminimizers through particle swarm pptimization," IEEE Transactions on1178

Evolutionary Computation, vol. 8, no. 3, pp. 211-224, 2004.

A.E. Raftery, T. Gneiting, F. Balabdaoui, and M. Polakowski, "Us-1180

ing Bayesian model averaging to calibrate forecast ensembles," MonthlyWeather Review, vol. 133, pp. 1155-1174, 2005.1182

J. Rings, J.A. Vrugt, G. Schoups, J.A. Huisman, and H. Vereecken, "Bayesianmodel averaging using particle filtering and Gaussian mixture modeling:1184

Theory, concepts, and simulation experiments,"Water Resources Research,48, W05520, doi:10.1029/2011WR011607, 2012.1186

G. Schoups, J.W. Hopmans, C.A. Young, J.A. Vrugt, W.W. Wallender,K.K. Tanji, and S. Panday "Sustainability of irrigated agriculture in the1188

San Joaquin Valley, California,"Proceedings of the National Academy ofSciences of the United States of America, vol. 102, pp. 15352-15356, 2005.1190

G. Schoups, and J.A. Vrugt, "A formal likelihood function for parameter andpredictive inference of hydrologic models with correlated, heteroscedas-1192

tic and non-gaussian errors," Water Resources Research, vol. 46, W10531,doi:10.1029/2009WR008933, 2010.1194

R. Storn, and K. Price, "A simple and efficient heuristic for global optimiza-tion over continuous spaces," Journal of Global Optimization, vol. 11, pp.1196

341-359, 1997.

D.A. van Veldhuizen, and G.B. Lamont, "Multiobjective evolutionary algo-1198

rithm research: A history and analysis," Dept. Elec. Comput. Eng., Grad-uate School of Eng., Air Force Inst. Technol., Wright-Patterson, AFB, OH,1200

Tech. Rep. TR-98-03, 1998.

J.A. Vrugt, H.V. Gupta, L.A. Bastidas, W. Bouten, and S. Sorooshian,1202

"Effective and efficient algorithm for multi-objective optimization of

50

AM

ALG

AM

MA

NU

AL

hydrologic models," Water Resources Research, vol. 39 (8), 1214,1204

doi:10.1029/2002WR001746, 2003.

J.A. Vrugt, M.P. Clark, C.G.H. Diks, Q. Duan, and B.A. Robin-1206

son, "Multi-objective calibration of forecast ensembles using Bayesianmodel averaging," Geophysical Research Letters, vol. 33, L19817,1208

doi:10.1029/2006GL027126.

J.A. Vrugt, and B.A. Robinson, "Improved evolutionary optimization from1210

genetically adaptive multimethod search," Proceedings of the NationalAcademy of Sciences of the United States of America, vol. 104, pp. 708-711,1212

doi:10.1073/pnas.0610471104, 2007a.

J.A. Vrugt, and B.A. Robinson, "Treatment of uncertainty using en-1214

semble methods: Comparison of sequential data assimilation andBayesian model averaging," Water Resources Research, vol. 43, W01411,1216

doi:10.1029/2005WR004838, 2007b.

J.A. Vrugt, C.J.F. ter Braak, M.P. Clark, J.M. Hyman, and B.A. Robinson,1218

"Treatment of input uncertainty in hydrologic modeling: Doing hydrologybackward with Markov chain Monte Carlo simulation," Water Resources1220

Research, vol. 44, W00B09, doi:10.1029/2007WR006720, 2008a.

J.A. Vrugt, C.G.H. Diks, and M.P. Clark, "Ensemble Bayesian model averag-1222

ing using Markov chain Monte Carlo sampling," Environmental Fluid Me-chanics, vol. 8 (5-6), pp. 579-595, doi:10.1007/s10652-008-9106-3, 2008b.1224

J.A. Vrugt, B.A. Robinson, and J.M. Hyman, "Self-adaptive multimethodsearch for global optimization in real-parameter spaces," IEEE Trans-1226

actions on Evolutionary Computation, vol. 13, no. 2, pp. 243-259,doi:10.1109/TEVC.2008.924428, 2009.1228

J.A. Vrugt, C.J.F. ter Braak, C.G.H. Diks, D. Higdon, B.A. Robinson, andJ.M. Hyman, "Accelerating Markov chain Monte Carlo simulation by dif-1230

ferential evolution with self-adaptive randomized subspace sampling," In-ternational Journal of Nonlinear Sciences and Numerical Simulation, vol.1232

10, no. 3, pp. 273-290, 2009b.

J.A. Vrugt, and C.J.F. ter Braak, "DREAM(D): an adaptive Markov Chain1234

Monte Carlo simulation algorithm to solve discrete, noncontinuous, and

51

AM

ALG

AM

MA

NU

AL

combinatorial posterior parameter estimation problems," Hydrology and1236

Earth System Sciences, vol. 15, pp. 3701-3713, doi:10.5194/hess-15-3701-2011, 2011.1238

J.A. Vrugt, and M. Sadegh, "Toward diagnostic model calibration and eval-uation: Approximate Bayesian computation," Water Resources Research,1240

vol. 49, doi:10.1002/wrcr.20354, 2013.

J.A. Vrugt, "Markov chain Monte Carlo simulation using the DREAM1242

software package: Theory, concepts, and MATLAB Implementation,"Environmental Modeling & Software, vol. XX, no. XX, pp. XX-XX,1244

doi:10.1016/j.envsoft.2014.XX.XXX, 2015.

J.A. Vrugt, "AMALGAM(D): Adaptive multimethod decomposition based1246

evolutionary optimization, "IEEE Transactions on Evolutionary Compu-tation, vol. XX, no. XX, pp. XX-XX, doi:XX, 2016.1248

D.J. Wales, and H.A. Scheraga, "Global optimization of clusters, crys-tals, and biomolecules," Science, vol. 285 (5432), pp. 1368-1372,1250

doi:10.1126/science.285.5432.1368.

L. While, P. Hingston, L. Barone, and S. Huband, "A faster algorithm for cal-1252

culating hypervolume", IEEE Transactions on Evolutionary Computation,vol. 10, no. 1, pp. 29-38, 2006.1254

D.H. Wolpert, and W.G. Macready, "No free lunch theorems for optimiza-tion," IEEE Transactions on Evolutionary Computation, vol. 1, pp. 67-82,1256

1997.

Q. Zhang and H. Li, "MOEA/D: A multi-objective evolutionary algorithm1258

based on decomposition," IEEE Transactions on Evolutionary Computa-tion, vol. 11, no. 6, pp. 712-731, 2007.1260

E. Zitzler, K. Deb, L. Thiele, "Comparison of multiobjective evolutionaryalgorithms: Empirical results," Evolutionary Computation, vol. 8, pp. 173-1262

195, 2000.

E. Zitzler, and L. Thiele, "Multiobjective evolutionary algorithms: a compar-1264

ative case study and the strength Pareto approach," IEEE Transactions onEvolutionary Computation, vol. 3, no. 4, pp. 257-271, 1999.1266

52

AM

ALG

AM

MA

NU

AL

E. Zitzler, M. Laumanns, and L. Thiele, "SPEA2: Improving the strengthPareto evolutionary algorithm," Comput. Eng. Net. Lab. (TIK), Dept.1268

Elec. Eng, ETH, Zurich, TIK-Rep. 103, pp. 1-21, 2001.

53