application of a multimodel approach to account for conceptual model and scenario uncertainties in...

20

Click here to load reader

Upload: rodrigo-rojas

Post on 04-Sep-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

Journal of Hydrology 394 (2010) 416–435

Contents lists available at ScienceDirect

Journal of Hydrology

journal homepage: www.elsevier .com/locate / jhydrol

Application of a multimodel approach to account for conceptual modeland scenario uncertainties in groundwater modelling

Rodrigo Rojas a,d,⇑, Samalie Kahunde b, Luk Peeters a,1, Okke Batelaan a,c, Luc Feyen d, Alain Dassargues a,e

a Applied Geology and Mineralogy, Department of Earth and Environmental Sciences, Katholieke Universiteit Leuven, Celestijnenlaan 200E, B-3001 Heverlee, Belgiumb Interunversity Programme in Water Resources Engineering (IUPWARE), Katholieke Universiteit Leuven and Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgiumc Department of Hydrology and Hydraulic Engineering, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgiumd Land Management and Natural Hazards Unit, Institute for Environment and Sustainability, Joint Research Centre, European Commission, Via E. Fermi 2749, TP261,I-21027, Ispra (Va), Italye Hydrogeology and Environmental Geology, Department of Architecture, Geology, Environment and Constructions (ArGEnCo), Université de Liège, B.52/3 Sart-Tilman,B-4000 Liège, Belgium

a r t i c l e i n f o

Article history:Received 13 November 2008Received in revised form 12 May 2010Accepted 20 September 2010

This manuscript was handled by PhillippeBaveye, Editor-in-Chief

Keywords:Groundwater flow modellingConceptual model uncertaintyScenario uncertaintyGLUEBayesian model averagingMarkov chain Monte Carlo

0022-1694/$ - see front matter � 2010 Elsevier B.V. Adoi:10.1016/j.jhydrol.2010.09.016

⇑ Corresponding author at: Land Management and Nfor Environment and Sustainability, Joint Research CVia E. Fermi 2749, TP261, I-21027, Ispra (Va), Italy. T

E-mail address: [email protected] (R.1 Now at: CSIRO Land & Water, Australia.

a b s t r a c t

Groundwater models are often used to predict the future behaviour of groundwater systems. These mod-els may vary in complexity from simplified system conceptualizations to more intricate versions. It hasbeen recently suggested that uncertainties in model predictions are largely dominated by uncertaintiesarising from the definition of alternative conceptual models. Different external factors such as climaticconditions or groundwater abstraction policies, on the other hand, may also play an important role. Rojaset al. (2008) proposed a multimodel approach to account for predictive uncertainty arising from forcingdata (inputs), parameters and alternative conceptualizations. In this work we extend upon this approachto include uncertainties arising from the definition of alternative future scenarios and we apply theextended methodology to a real aquifer system underlying the Walenbos Nature Reserve area in Belgium.Three alternative conceptual models comprising different levels of geological knowledge are considered.Additionally, three recharge settings (scenarios) are proposed to evaluate recharge uncertainties. A jointestimation of the predictive uncertainty including parameter, conceptual model and scenario uncertain-ties is estimated for groundwater budget terms. Finally, results obtained using the improved approachare compared with the results obtained from methodologies that include a calibration step and whichuse a model selection criterion to discriminate between alternative conceptualizations. Results showedthat conceptual model and scenario uncertainties significantly contribute to the predictive variance forsome budget terms. Besides, conceptual model uncertainties played an important role even for the casewhen a model was preferred over the others. Predictive distributions showed to be considerably differentin shape, central moment and spread among alternative conceptualizations and scenarios analysed. Thisreaffirms the idea that relying on a single conceptual model driven by a particular scenario, will likelyproduce bias and under-dispersive estimations of the predictive uncertainty. Multimodel methodologiesbased on the use of model selection criteria produced ambiguous results. In the frame of a multimodelapproach, these inconsistencies are critical and can not be neglected. These results strongly advocatethe idea of addressing conceptual model uncertainty in groundwater modelling practice. Additionally,considering alternative future recharge uncertainties will permit to obtain more realistic and, possibly,more reliable estimations of the predictive uncertainty.

� 2010 Elsevier B.V. All rights reserved.

1. Introduction and scope

Groundwater models are often used to predict the behaviour ofgroundwater systems under future stress conditions. These models

ll rights reserved.

atural Hazards Unit, Instituteentre, European Commission,el.: +39 0332 78 97 13.Rojas).

may vary in the level of complexity from simplified groundwatersystem representations to more elaborated models accountingfor detailed descriptions of the main processes and geologicalproperties of the groundwater system. Whether to postulate sim-plified or complex/elaborated models for solving a given problemhas been subject of discussion for several years (Neuman andWierenga, 2003; Gómez-Hernández, 2006; Hill, 2006; Hunt et al.,2007). Parsimony is the main argument for those in favour ofsimpler models (see e.g. Hill and Tiedeman, 2007) whereas a morerealistic representation of the unknown true system seems the main

Page 2: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435 417

argument favouring more elaborated models (see e.g. Rubin, 2003;Renard, 2007). To some extent, this debate has contributed to thegrowing tendency among hydrologists of postulating alternativeconceptual models to represent optional dynamics explaining theflow and solute transport in a given groundwater system (Harraret al., 2003; Meyer et al., 2004; Højberg and Refsgaard, 2005;Troldborg et al., 2007; Seifert et al., 2008; Ijiri et al., 2009; Rojaset al., 2010a).

It has been recently suggested that uncertainties in groundwa-ter model predictions are largely dominated by uncertainty arisingfrom the definition of alternative conceptual models and that para-metric uncertainty solely does not allow compensating for concep-tual model uncertainty (Bredehoeft, 2003; Neuman, 2003; Neumanand Wierenga, 2003; Ye et al., 2004; Bredehoeft, 2005; Højberg andRefsgaard, 2005; Poeter and Anderson, 2005; Refsgaard et al.,2006; Meyer et al., 2007; Refsgaard et al., 2007; Seifert et al.,2008; Rojas et al., 2008; Rojas et al., 2009). Additionally, this lastsituation is exacerbated for the case when predicted variables arenot included in the data used for calibration (Højberg andRefsgaard, 2005; Troldborg et al., 2007). This suggests that it ismore appropriate to postulate alternative conceptual models andanalyse the combined multimodel predictive uncertainty thanrelying on a single hydrological conceptual model. Working witha single conceptualization is more likely to produce biased andunder-dispersive uncertainty estimations whereas working witha multimodel approach, uncertainty estimations are less(artificially) conservative and they are more likely to capture theunknown true predicted value.

Practice suggests, however, that once a conceptual model is suc-cessfully calibrated and validated, for example, following themethods described by Hassan (2004a), Hassan (2004b), its resultsare rarely questioned and the conceptual model is assumed to becorrect. As a consequence, the conceptual model is only revisitedwhen sufficient data have been collected to perform a post-auditanalysis (Anderson and Woessner, 1992), which often may takeseveral years, or when new collected data and/or scientific evi-dence challenge the definition of the original conceptualization(Bredehoeft, 2005). In this regard, Bredehoeft (2005) presents aseries of examples where unforeseen elements or the collectionof new data challenged well established conceptual models. Thissituation clearly states the gap between practitioners and the sci-entific community in addressing predictive uncertainty estima-tions in groundwater modelling in presence of conceptual modeluncertainty.

Different external factors such as climatic conditions or ground-water abstraction policies, on the other hand, increase the uncer-tainty in groundwater model predictions due to unknown futureconditions. This source of uncertainty has since long been recog-nized as an important source of predictive uncertainty, however,practical applications mainly focus on uncertainty derived fromparameters and inputs, neglecting conceptual model and scenariouncertainties (Rubin, 2003; Gaganis and Smith, 2006). Recently,Rojas and Dassargues (2007) analysed the groundwater balanceof a regional aquifer in northern Chile considering different pro-jected groundwater abstraction policies in combination with sto-chastic groundwater recharge scenarios. Meyer et al. (2007)presented a combined estimation of conceptual model and sce-nario uncertainties in the framework of Maximum LikelihoodBayesian Model Averaging (MLBMA) (Neuman, 2003) for a ground-water flow and transport modelling study case.

In recent years, several methodologies to account for uncertain-ties arising from inputs (forcing data), parameters and the defini-tion of alternative conceptual models have been proposed in theliterature (Beven and Binley, 1992; Neuman, 2003; Poeter andAnderson, 2005; Refsgaard et al., 2006; Ajami et al., 2007; Rojaset al., 2008). Two appealing methodologies in the case of ground-

water modelling are the MLBMA method (Neuman, 2003) andthe information-theoretic based method of Poeter and Anderson(2005). Both methodologies are based on the use of a model selec-tion criterion, which is derived as a by-product of traditional cali-bration methods such as Maximum Likelihood (ML) or WeightedLeast Squares (WLS). The use of a model selection criterion allowsranking alternative conceptual models, eliminating some of them,or weighing and averaging model predictions in a multimodelframework. In our case, we are interested in weighing and averag-ing predictions from alternative conceptual models to obtain acombined estimation of the predictive uncertainty. The most com-monly used model selection criteria correspond to Akaike Informa-tion Criterion (AIC) (Akaike, 1974), modified Akaike InformationCriterion (AICc) (Hurvich and Tsai, 1989), Bayesian InformationCriterion (BIC) (Schwartz, 1978) and Kashyap Information Criterion(KIC) (Kashyap, 1982). Ye et al. (2008a) give an excellent discussionon the merits and demerits of alternative model selection criteriain the context of variogram multimodel analysis. In MLBMA, KICis the suggested criterion whereas for the information-theoreticbased method of Poeter and Anderson (2005), AICc is preferred.Even though Ye et al. (2008a) appear to have settled the contro-versy on the use of alternative model selection criteria, the use ofdifferent model selection criteria to weigh and combine multimod-el predictions in groundwater modelling may lead to controversialand misleading results.

Apart from common problems of parameter non-uniqueness(insensitivity) and ‘‘locality behaviour” of the calibration ap-proaches mentioned above, Refsgaard et al. (2006) pointed outan important disadvantage of including a calibration stage in amultimodel framework. In the case of multimodel approachesincluding a calibration step, errors in the conceptual models(which per definition can not be excluded) will be compensatedby biased parameter estimates in order to optimize model fit inthe calibration stage. This has been confirmed by Troldborg et al.(2007) for a real aquifer system in Denmark.

Recently, Rojas et al. (2008) proposed an alternative methodol-ogy to account for predictive uncertainty arising from inputs (forc-ing data), parameters and the definition of alternative conceptualmodels. This method combines Generalized Likelihood UncertaintyEstimation (GLUE) (Beven and Binley, 1992) and Bayesian modelaveraging (BMA) (Draper, 1995; Kass and Raftery, 1995; Hoetinget al., 1999; Hoeting, 2002). The basic idea behind this methodol-ogy is the concept of equifinality, that is, many alternative concep-tual models together with many alternative parameter sets willproduce equally likely good results when compared to limited ob-served data (Beven et al., 2001; Beven, 2006). Equifinality, as de-fined by Beven and Prophecy (1993), Beven (2006), arisesbecause of the combined effects of errors in the forcing data, sys-tem conceptualization, measurements and parameter estimates.In the method of Rojas et al. (2008) series of ‘‘behavioural” param-eters are selected for each alternative model producing a cumula-tive density function (cdf) for parameters and variables of interest.Using the performance values obtained from GLUE, weights foreach conceptual model are estimated and results obtained for eachmodel are combined following BMA in a multimodel frame. Animportant aspect of the method is that it does not rely on a uniqueparameter optimum or conceptual model to assess the joint predictiveuncertainty, thus, avoiding compensation of conceptual model errorsdue to biased parameter estimates. A complete description of themethodology and potential advantages are discussed in Rojaset al. (2008).

Rojas et al. (2008) used a traditional Latin Hypercube Sam-pling (LHS) scheme (McKay et al., 1979) to implement the com-bined GLUE-BMA methodology. This sampling scheme has beenregularly used in GLUE applications. Blasone et al. (2008a,b)demonstrated that the efficiency of the GLUE methodology can

Page 3: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

418 R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435

be boosted up by including a Markov Chain Monte Carlo (MCMC)sampling scheme. MCMC is a sampling technique that producesa Markov Chain with stationary probability distribution equal toa desired distribution through iterative Monte Carlo simulation.This technique is particularly suitable in Bayesian inferencewhen the analytical forms of posterior distributions are notavailable or in cases of high dimensional posterior distributions.

In this work we extend upon the methodology of Rojas et al.(2008) to include the uncertainty in groundwater model predic-tions due to the definition of alternative conceptual models andalternative recharge settings. For that, we follow an approach sim-ilar to that described in Meyer et al. (2007) and patterned afterDraper (1995). To illustrate that the method proposed in Rojaset al. (2008) is fully applicable to a real groundwater system, weimplemented the extended methodology in the aquifer systemunderlying and feeding the Walenbos Nature Reserve area inBelgium (Fig. 1). In addition, we improve on the sampling schemeoriginally used by implementing a Markov Chain Monte Carlo(MCMC) method. We postulate three alternative conceptual mod-els comprising different levels of geological knowledge for thegroundwater system. Average recharge conditions are used to cal-ibrate each conceptual model under steady-state conditions. Twoadditional recharge settings corresponding to ±2 standard devia-tions from average recharge conditions are proposed to evaluatethe uncertainty in the results due to the definition of alternative

MolenbeekRiver

DemerRiver

VelpRiver

MotteRiver

181000 184000 187000 1900

1680

0017

0000

1720

0017

4000

1760

0017

8000

1800

0018

2000

1840

0018

6000

1880

00

Fig. 1. Location of the study area, river network and location of the 51 observatio

recharge values. A combined estimation of the predictiveuncertainty including parameter, conceptual model and scenariouncertainties is estimated for a set of groundwater budget termssuch as river gains and river losses, drain outflows and groundwa-ter inflows and outflows from the Walenbos area. Finally, resultsobtained using the combined GLUE-BMA methodology are com-pared with the results obtained using multimodel methodologiesthat include a calibration step and a model selection criterion todiscriminate between models.

The remainder of this paper is organized as follows. In Section 2,we provide a condensed overview of GLUE, BMA and MCMC theoryfollowed by a description of the procedure to integrate thesemethods. Section 3 details the study area where the integrateduncertainty assessment methodology is applied. Implementationdetails such as the different conceptualizations, recharge uncer-tainties and the summary of the modelling procedure are describedin Section 4. Results are discussed in Section 5 and a summary ofconclusions is presented in Section 6.

2. Materials and methods

Sections 2.1, 2.2 and 2.3 elaborate on the basis of GLUE, BMAand MCMC methodologies, respectively, for more details the readeris referred to Hoeting et al. (1999), Gelman et al. (2004), Beven(2006) and Rojas et al. (2008, 2009).

LEGEND

Rivers and Streams

Walenbos Nature Reserve

Observation Wells

00

0 2000 4000 6000 m

n wells used as dataset D for the application of the multimodel methodology.

Page 4: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435 419

2.1. Generalized likelihood uncertainty estimation (GLUE)

GLUE is a Monte Carlo-based simulation technique based on theconcept of equifinality (Beven et al., 2001). It rejects the idea of asingle correct representation of a system in favour of many accept-able system representations (Beven, 2006). For each potential sys-tem simulator, sampled from a prior set of possible systemrepresentations, a likelihood measure (e.g. Gaussian, trapezoidal,model efficiency, inverse error variance, etc.) is calculated, whichreflects its ability to simulate the system responses, given theavailable observed dataset D. Simulators that perform below a sub-jectively defined rejection criterion are discarded from furtheranalysis and likelihood measures of retained simulators are re-scaled so as to render the cumulative likelihood equal to 1. Ensem-ble predictions are based on the predictions of the retained set ofsimulators, weighted by their respective rescaled likelihood.

Likelihood measures used in GLUE must be seen in a muchwider sense than the formal likelihood functions used in tradi-tional statistical estimation theory (Binley and Beven, 2003). Theselikelihoods are a measure of the ability of a simulator to reproducea given set of observed data, therefore, they represent an expres-sion of belief in the predictions of that particular simulator ratherthan a formal definition of probability. However, GLUE is fullycoherent with a formal Bayesian approach when the use of a clas-sical likelihood function is justifiable (Romanowicz et al., 1994).

Rojas et al. (2008) observed, in the analysis of their hypotheticalgroundwater system, no significant differences in the estimation ofposterior model probabilities, predictive capacity and conceptualmodel uncertainty when a Gaussian, a model efficiency or a Fuz-zy-type likelihood function was used. The analysis in this work istherefore confined to a Gaussian likelihood function L(Mk,hl,YmjD),where Mk is the kth conceptual model (or model structure) in-cluded in the finite and discrete ensemble of alternative conceptu-alizations M, h is the lth parameter vector, Ym is the mth input datavector and D is the observed system variable vector.

2.2. Bayesian model averaging (BMA)

BMA provides a coherent framework for combining predictionsfrom multiple competing conceptual models to attain a more real-istic and reliable description of the predictive uncertainty. It is astatistical procedure that infers average predictions by weighingindividual predictions from competing models based on their rela-tive skill, with predictions from better performing models receiv-ing higher weights than those of worse performing models. BMAavoids having to choose a model over the others, instead, observeddataset D give the competing models different weights Wasserman(2000).

Following the notation of Hoeting et al. (1999), if D is a quantityto be predicted, the full BMA predictive distribution of D for a set ofalternative conceptual models M = (M1,M2, . . . ,Mk, . . . ,MK) underdifferent scenarios S = (S1,S2, . . . ,Si, . . . ,SI) is given by Draper (1995)

pðDjDÞ ¼XI

i¼1

XK

k¼1

pðDjD;Mk; SiÞpðMkjD; SiÞpðSiÞ ð1Þ

Eq. (1) is an average of the posterior distributions of D under eachalternative conceptual model and scenario considered, p(DjD,Mk,Si),weighted by their posterior model probability, p(MkjD,Si) and byscenario probabilities, p(Si). The posterior model probabilitiesconditional on a given scenario reflect how well model Mk fits theobserved data D and can be computed using Bayes’ rule

pðMkjD; SiÞ ¼pðDjMkÞpðMkjSiÞPKl¼1pðDjMlÞpðMljSlÞ

ð2Þ

where p(MkjSi) is the prior model probability of model Mk under sce-nario i and p(DjMk) is the integrated likelihood of the model Mk. Animportant assumption in the estimation of posterior model proba-bilities (Eq. (2)) is the fact that the dataset D is independent of fu-ture scenarios. That is, the probability of observing the dataset Dis not affected by the occurrence of any future scenario Si (Meyeret al., 2007). In a strict sense, however, model likelihoods may de-pend on future scenarios given the correlation of recharge andhydraulic conductivity. Accounting for this dependency wouldmake difficult to clearly assess the intrinsic value of the conceptualmodels or the ‘‘extra worth” of the data itself to explain the ob-served system responses. This assessment is beyond the scope ofthis article and for the sake of clarity the assumption of indepen-dence of D and, as consequence, of model likelihoods and posteriormodel probabilities from the future scenarios will be retained.

As a result, model likelihoods do not depend on the scenariosand, in contrast, prior model probabilities might be a function offuture scenarios.

The leading moments of the full BMA prediction of D are givenby Draper (1995)

E½DjD� ¼XI

i¼1

XK

k¼1

E½DjD;Mk; Si�pðMkjD; SiÞpðSiÞ ð3Þ

Var½DjD� ¼XI

i¼1

XK

k¼1

Var½DjD;Mk; Si�pðMkjD; SiÞpðSiÞ

þXI

i¼1

XK

k¼1

E½DjD;Mk; Si� � E½DjD; Si�ð Þ2 � pðMkjD; SiÞpðSiÞ

þXI

i¼1

E½DjD; Si� � E½DjD�ð Þ2pðSiÞ ð4Þ

From Eq. (4) it is seen that the variance of the full BMA predictionconsists of three terms: (I) within-model and within-scenariovariance, (II) between-model and within-scenario variance and,(III) between-scenario variance (Meyer et al., 2007).

2.3. Markov chain Monte Carlo simulation

As discussed in Rojas et al. (2008), due to the presence of multi-ple local optima in the global likelihood response surfaces, goodperforming simulators might be well distributed across the hyper-space dimensioned by the set of conceptual models and forcingdata (inputs) and parameter vectors. This necessitates that the glo-bal likelihood response surface is extensively sampled to ensureconvergence of the posterior moments of the predictive distribu-tions. In the context of the proposed (GLUE-BMA) methodology,we resort to Markov Chain Monte Carlo (MCMC) to partly alleviatethe computational burden of a traditional sampling scheme (e.g.Latin Hypercube Sampling).

The origins of MCMC methods can be traced back to the worksof Metropolis et al. (1953) and the generalization by Hastings(1970). These works gave rise to a general MCMC method, namely,the Metropolis–Hastings (M–H) algorithm. The idea of this tech-nique is to generate a Markov Chain for the model parametersusing iterative Monte Carlo simulation that has, in an asymptoticsense, the desired posterior distribution as its stationary distribu-tion (Sorensen and Gianola, 2002). Reviews and a more elaborateoverview of alternative algorithms to implement MCMC are givenin Gilks et al. (1995), Sorensen and Gianola (2002), Gelman et al.(2004) and Robert (2007).

The M–H algorithm stochastically generates a series with sam-ples of parameters hi, i = 1, . . . ,N through iterative Monte Carlo longenough such that, asymptotically, the stationary distribution ofthis series is the target posterior distribution, p(hjD). This algo-rithm can be summarized as follows:

Page 5: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

420 R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435

1. set a starting location for the chain h0;2. set i = 1, . . . ,N;3. generate a candidate parameter h* from a proposal distribution

q(h*j�);4. calculate a ¼ pðh�jDÞqð�jh�i Þ

pðhi�1 jDÞqðh�j�Þ;

5. draw a random number u 2 [0,1] from a uniform probabilitydistribution;

6. if min{1,a} > u, then set hi = h* otherwise hi = hi�1;7. repeat steps (3) through (6) N times.

The generation of the Markov Chain is, thus, achieved in a two-step process: a proposal step (step #3) and an acceptance step(step #6) (Sorensen and Gianola, 2002). Note that the proposal dis-tribution q(h*j�) may (or may not) depend on the current position ofthe chain, hi�1 and may (or may not) be symmetric (Chib andGreenberg, 1995). These two properties are often modified to ob-tain alternative variants of the M–H algorithm (see e.g. Tierney,1994). From the M–H algorithm, there is a natural tendency forparameters with higher posterior probabilities than the currentparameter vector to be accepted and those with lower posteriorprobabilities to be rejected (Gallagher and Doherty, 2007).

Implementations details of the M–H algorithm are widelydiscussed in the literature (see e.g. Geyer, 1992; Gilks et al.,1995; Cowles and Carlin, 1996; Brooks and Gelman, 1998; Makow-ski et al., 2002; Sorensen and Gianola, 2002; Gelman et al., 2004;Ghosh et al., 2006; Robert, 2007) and so they will not be repeatedhere.

2.4. Multimodel approach to account for conceptual model andscenario uncertainties

Combining GLUE and BMA in the frame of the method proposedby Rojas et al. (2008) to account for conceptual model and scenariouncertainties involves the following sequence of steps

1. On the basis of prior and expert knowledge about the site, asuite of alternative conceptualizations is proposed, following,for instance, the methodology proposed by Neuman andWierenga (2003). In this step, a decision on the values of priormodel probabilities should be taken (Ye et al., 2005; Meyeret al., 2007; Ye et al., 2008b). Additionally, a suite of scenariosto be evaluated and their corresponding prior probabilitiesshould be defined at this stage.

2. Realistic prior ranges are defined for the inputs and parametervectors under each plausible model structure.

3. A likelihood measure and rejection criterion to assess modelperformance are defined (Jensen, 2003; Rojas et al., 2008). Arejection criterion can be defined from exploratory runs of thesystem, based on subjectively chosen threshold limits (Feyenet al., 2001) or as an accepted minimum level of performance(Binley and Beven, 2003).

4. For the suite of alternative conceptual models, parameter val-ues are sampled using a Markov Chain Monte Carlo (MCMC)algorithm (Gilks et al., 1995) from the prior ranges defined in(3) to generate possible representations or simulators of thesystem. A likelihood measure is calculated for each simulator,based on the agreement between the simulated and observedsystem response.

5. For each conceptual model Mk, the model likelihood is approx-imated using the likelihood measure. A subset Ak of simulatorswith likelihoods p(DjMk,hl) � L(Mk,hl,YmjD) is retained based onthe rejection criterion.

6. Steps 4–5 are repeated until the hyperspace of possible simula-tors is adequately sampled, i.e. when the first two moments forthe conditional distributions of parameters based on the likeli-

hood weighted simulators converge to stable values for each ofthe conceptual models Mk and when the R-score (Gelman et al.,2004) for multiple Markov Chains converges to values close toone.

7. The integrated likelihood of each conceptual model Mk (Eq. (2))is approximated by summing the likelihood weights of theretained simulators in the subset Ak, that is, pðDjMkÞ �

Pl;m2Ak

LðMk; hl;YljDÞ.8. The posterior model probabilities are then obtained by normal-

izing the integrated model likelihoods over the whole ensembleM such that they sum up to one using Eq. (2).

9. After normalization of the likelihood weighted predictionsunder each individual model for each alternative scenario (suchthat the cumulative likelihood under each model and scenarioequals one), an approximation to p(DjD,Mk,Si) is obtained anda multimodel prediction is obtained with Eq. (1). The leadingmoments of this distribution are obtained with Eqs. (3) and(4) considering all scenarios.

Posterior model probabilities obtained in step #8 are used inthe prediction stage for the alternative conceptual models underalternative scenarios. Thus, the more demanding steps of themethodology (step #4 and step #5) are done only once to obtainthe posterior model probabilities. This is based on the assumptionthat the observed data D is independent of future scenarios. That is,the probability of observing the data D is not affected by the occur-rence of any future scenario Si (Meyer et al., 2007).

2.5. Multimodel methods and model selection criteria

As previously stated, multimodel methodologies using modelselection or information criteria have been proposed by Neuman(2003) and Poeter and Anderson (2005). These model or informa-tion criteria are obtained as by-products of the calibration ofgroundwater models. As suggested by Ye et al. (2008a), Eq. (2)can be approximated by

pðMkjD; SiÞ �exp � 1

2 DICk� �

pðMkjSiÞPKl¼1 exp � 1

2 DICl� �

pðMljSlÞð5Þ

where DICk = ICk � ICmin, ICk being any of the model selection orinformation criteria described in Section 1 for a given model Mk

and ICmin the minimum value obtained across models Mk,k = {1, . . . ,K}. These posterior model probabilities are then used toestimate the leading moments of the BMA prediction (Eqs. (3) and(4)) considering alternative conceptual models and alternativescenarios.

Alternative model selection or information criteria differ inmathematical expressions, in the way they penalize the inclusionof extra model parameters, or how they value prior informationabout model parameters. These differences produce dissimilar re-sults for Eq. (5) even for the case of a common dataset D to all mod-els. This may lead to controversial and misleading results whenposterior model probabilities obtained using Eq. (5) are used to ob-tain the leading moments of the BMA predictions (Eqs. (3) and (4)).

3. Study area

3.1. General description

The Walenbos Nature Reserve is located in the northern part ofBelgium, 30 km North-East of Brussels, in the valley of the brook‘‘Brede Motte” (Fig. 1). It is a forested wetland of regional impor-tance highly dependant on groundwater discharges, especially, inshallow depressions (De Becker and Huybrechts, 1997). Previousstudies have shown that groundwater discharging in the wetland

Page 6: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435 421

infiltrates over a large area, mainly south of the wetland and it con-sists of groundwater of different aquifers (Batelaan et al., 1993;Batelaan et al., 1998).

The study area is bounded by two main rivers, the Demer Riverin the North and the Velp River in the South. Other minor rivers areobserved within the study area: the Motte River, which drains thewetland towards the North, the Molenbeek River and the Winge-beek River (Fig. 1). The Demer and the Velp rivers have an elevationof 10 m above sea level (asl) and 35 m asl, respectively. Betweenthese two rivers the area consists of undulating hills and plateausreaching a maximum elevation of 80 m asl. Within the WalenbosNature Reserve area, the slightly raised central part divides thewetland into an Eastern and Western sub-basin.

Larger and smaller rivers are administratively classified intocategories for water management purposes (HAECON and Wittev-een+Bos, 2004). The Demer is navigable and of category 0 while theVelp is smaller and of category 1. The Wingebeek, Motte andMolenbeek are category 2 rivers. From these categories, initialproperties (e.g. bed sediment thickness, river width, depth, etc.)for the main rivers are obtained and, consequently, used to esti-mate values of river conductance.

There are several observation wells within the study area fromdifferent monitoring networks of the Flemish Environment Agency(VMM) and the Research Institute for Nature and Forest (INBO).These data are made available through the Database of theSubsurface for Flanders (DOV, 2008). In this study 51 observationwells are used (Fig. 1), most of them concentrated in the Walenbosarea.

MolenbeekRiver

DemerRiver

VelpRiver

MotteRiver

181000 184000 187000 190

1680

0017

0000

1720

0017

4000

1760

0017

8000

1800

0018

2000

1840

0018

6000

1880

00

Fig. 2. Geological map

3.2. Geology and hydrogeology

Fig. 2 shows the geological map of the study area. Additionally,Table 1 shows the lithostratigraphic description of the formationspresent in the study area. The geology of the study area consistsof an alteration of sandy and more clayey formations, generallydipping to the north and ranging in age from the Early Eocene tothe Miocene. The Hannut Formation are clayey or sandy silts withlocally a siliceous limestone. The Formation only crops out south ofthe Velp River. The Kortrijk Formation is a marine deposit consist-ing mainly of clayey sediments. This formation is covered by theBrussel Formation, a heterogeneous alteration of coarse and finesands, locally calcareous and/or glauconiferous. The EarlyOligocene Sint Huibrechts Hern Formation is a glauconiferous ormicaeous, clayey fine sand, which is locally very fossiliferous. TheBorgloon Formation represents a transition to a more continentalsetting and consists of a layer of clay lenses followed by an alter-ation of sand and clay layers. The Bilzen Formation represents amarine deposit consisting of fine sands, glauconiferous at the base.The Bilzen sands are followed by a clay layer, the Boom Formation.On top of the Boom Formation, the Bolderberg Formation is foundwhich consists of medium fine sands, locally clayey. The youngestdeposits consist of coarse, glauconiferous sands of the Diest Forma-tion. These sands are deposited in a high energetic, shallow marinesetting and have locally eroded underlying formations. In theWalenbos area, for example, the Diest Formation is directly incontact with the Brussel Formation. The Kortrijk, Brussel and SintHuibrechts Hern formations are present in the entire study area,

LEGEND

Walenbos Nature Reserve

Rivers and Streams

Fm Diest

Fm Bolderberg

Fm Boom

Fm Borgloon

Fm Bilzen

Fm Sint Huibrechts Hern

Fm Brussels

Fm Kortrijk

Fm Hannut

000

0 2000 4000 6000 m

of the study area.

Page 7: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

Table 1Lithostratigraphic description of formations present in the study area.

Time Lithostratigraphy Lithology

Group Formation

Quaternary Eolian deposits Loam and sandy LoamAlluvial deposits Sand, Silt, Clay, possible

Gravel to base

Miocene Diest Coarse Sand withglauconite and iron Sandtoe banks

Bolderberg Fine Sand with mica

Oligocene Rupel Boom Clay with septarienBilzen Fine Sand with shell rests

Tongeren Borgloon Clay and coarse SandSint HuibrechtsHern

Fine Sand with glauconiteand mica

Eocene Zenne Brussels Fine SandIeper Kortrijk Clay and traces of fine

Sand

Paleocene Landen Hannut Fine to silty Sand

422 R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435

while the younger layers disappear towards the south or areeroded in the valleys. The study area is covered with Quaternarysediments, consisting of loamy eolian deposits on the interfluvesand alluvial deposits in the river valleys. The geological character-istics of the study area are described in detail in Laga et al. (2001)and Gullentops et al. (2001).

The Hydrogeological Code for Flanders (HCOV) is used to iden-tify different hydrogeological units (Meyus et al., 2000; Cools et al.,2006). The hydrogeological conceptualization of the aquifer systemsurrounding and underlying the Walenbos Nature Reserve areawas schematized as one-, three- and five-layers with the top ofthe Kortrijk Formation as the bottom boundary for all conceptual-izations considered (Fig. 3 and Table 2). These geological modelswere developed to assess the worth of extra ‘‘soft” geologicalknowledge about the geometry of the groundwater system under-lying the Walenbos Nature Reserve. In this way, alternative layer-ing structures for the aquifer are assessed in terms of improvingthe model performance.

Eas181000 184000

1680

0017

0000

1720

0017

4000

1760

0017

8000

1800

0018

2000

1840

0018

6000

1880

00

Nor

th [

m]

B

DemerRiver

VelpRiver

180000 191000182000 184000 186000 188000 190000

HK-1

'BB

z [m

] M1

180000 191000182000 184000 186000 188000 190000

HK-2HK-3

HK-1

z [m

] M2

180000 191000182000 184000 186000 188000 190000

181000 184000 187000 190000

East [m]

HK-2HK-3

HK-1

HK-4HK-5

z [m

] M3

80

40

0

80

40

0

80

40

0

Fig. 3. Layer setup for three alternative conceptual models M1, M2 and M3.

4. Implementing the integrated uncertainty assessment

Three alternative conceptual models comprising different levelsof geological knowledge are proposed (Fig. 3). Each model is as-signed a prior model probability of 1/3. All proposed conceptualmodels are bounded by the Kortrijk Formation as low permeabilitybottom and the topographical surface for the top of the system.Model 1 (M1) corresponds to the simplest representation consider-ing one hydrostratigraphic unit, Model 2 (M2) comprises threehydrostratigraphic units and Model 3 (M3) corresponds to themost complex system comprising five hydrostratigraphic units.Details are presented in Table 2 and Fig. 3.

Groundwater models for the three conceptualizations are con-structed using MODFLOW-2005 (Harbaugh, 2005). The groundwa-ter flow regime is assumed as steady-state conditions. The modelarea is ca. 11 � 22 km2. Using a uniform cell size of 100 m the mod-elled domain is discretised into 110 � 220 cells. The total number ofcells varies from model to model since the number of layers to ac-count for different hydrostratigraphic units changed. At the Northand South, respectively, the Demer and Velp rivers are defined asboundary conditions using the river package of MODFLOW-2005.Physical properties of both rivers (e.g. width, thickness of bed sed-iments and river stage) are obtained from models built within theframe of the Flemish Groundwater Model (HAECON and Wittev-een+Bos, 2004). All grid cells located to the North of the Demerand to the South of the Velp, respectively, are set as inactive (i.e.no-flow). East and west limits of the modelled domain are definedas no-flow boundary conditions. To account for possible groundwa-ter discharge zones in the study area, the drain package is used forall active cells in the uppermost layer of each model. The elevationof the drain element for each cell is defined as the topographicalelevation minus 0.5 m, in order to account for an average drainagedepth of ditches and small rivulets (Batelaan and De Smedt, 2004).

The focus of this work is on the assessment of conceptual modeland recharge (scenario) uncertainties. Therefore, we confine thedimensionality of the analysis by considering uncertainty only inthe conductance parameters related to the Demer and Velp rivers,conductance of drains and hydraulic conductivities of the alterna-tive hydrostratigraphic units (see Tables 3 and 1). Additionally, thespatial zonation of the hydraulic conductivity field is kept constantand only the mean values for each hydrostratigraphic unit are

t [m]187000 190000

A

A'

B'

Walenbos NatureReserve

85

0

-85

WalenbosAA'

z [m

]

M1

85

0

-85

Walenbos

z [m

]

M2

168000 170000 172000 174000 176000 17800 180000 182000 184000 186000 1880000

North [m]

85

0

-85

Walenbos

z [m

]

M3

Details for each hydrostratigraphic unit are described in Tables 1 and 2.

Page 8: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

Table 2Hydrostratigraphic unit setup for conceptual models M1, M2 and M3.

Formation Hydraulic conductivity parameter

Model M1 Model M2 Model M3

Eolian and alluvial depositsDiest HK-1 HK-1Bolderberg

Boom HK-1 HK-2Bilzen HK-2Borgloon HK-3Sint Huibrechts Hern HK-4

Brussels HK-3 HK-5

Table 3Range of prior uniform distributions for unknown parameters common to the threeconceptual models M1, M2 and M3.

Parameter Range

Minimum Maximum

Conductance River Demer (m2 d�1) 0 1.0 � 104

Conductance River Velp (m2 d�1) 0 1.0 � 104

Conductance Drain elements (m2 d�1) 0 1.0 � 104

Hydraulic conductivities (m d�1) 0 50

R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435 423

sampled using the M–H algorithm. Parameter ranges are definedbased on data from previous studies and they are presented in Ta-ble 3 (HAECON and Witteveen+Bos, 2004). It is worth noting thatin the frame of the proposed methodology, heterogeneous fieldsfollowing the theory of Random Space Functions (RSF) are easilyimplemented (Rojas et al., 2008).

Average recharge conditions ðRÞ over a grid of 100 m � 100 maccounting for average hydrological conditions is obtained fromBatelaan et al. (2007). Spatially distributed recharge values are cal-culated with WetSpass (Batelaan and De Smedt, 2007), which is aphysically based water balance model for calculating the quasi-steady-state spatially variable evapotranspiration, surface runoffand groundwater recharge at a grid-cell basis. The average rechargecondition constitutes the base situation for the estimation of theposterior model probabilities. Additionally, to account for recharge(scenario) uncertainties, two optional recharge situations are de-fined based on a deviation corresponding to �2rR from the averagerecharge scenario ðRÞ. The value of rR is defined as the standarddeviation of the spatially distributed recharge values obtained fromthe WetSpass calculations. Although not shown here, these re-charge values follow a normal distribution, thus validating the def-inition of this deviation ð�2rRÞ from average recharge conditions.We used �2rR to make an intuitive link with the expression of95% confidence intervals for potential recharge values. The defini-tion of these three recharge settings is based on long-term simula-tions of the average hydrological conditions accounting for morethan 100 years of meteorological data (see Batelaan and De Smedt,2007). Although in a strict sense, the plausibility of these averagerecharge values might have been evaluated as they took place inthe past similarly to the dataset D, this is not possible as D consid-ered a limited and variable time series of head measurements. Thekey assumptions for the analysis performed in this work are, first,the nature of the steady-state condition of D. This steady-state con-dition is valid for present-time situation only since the time seriesavailable with observed heads are considerably less than the seriesof meteorological data used to estimate average recharge condi-tions (S2). Second, it is the fact that there is no guarantee that sim-ilar (climate) recharge conditions will be observed for the next 100years. The latter will have a clear influence on the definition ofcoherent prior probabilities for each scenario.

Based on the assumption previously discussed, recharge uncer-tainties are treated as scenario uncertainties in the context of the

proposed GLUE-BMA method (Eqs. (1)–(4)). To avoid conflictingterminology, however, both terms scenario uncertainties and re-charge uncertainties are used interchangeably hereafter.

Based on long-term simulations three recharge conditions (sce-narios) are defined: S1 ðR� 2rRÞ, S2 ðRÞ and S3 ðRþ 2rRÞ. Averagevalues for S1, S2 and S3 are 93.1 mm yr�1, 205.4 mm yr�1 and319.5 mm yr�1, respectively. Each recharge scenario is assigned aprior scenario probability of 1/3. This is based on the fact that forfuture recharge conditions, average or tail values are equally likelyto be observed.

A Gaussian likelihood function is implemented to assess modelperformance, i.e. to assess the ability of each simulator to repro-duce the observed data D. Observed heads (hobs) for the 51 obser-vation wells depicted in Fig. 1 are compared to simulated heads(hsim) to obtain a likelihood measure. Observed heads correspondto a representative value (average) for steady state-conditions fordifferent time series in the period 1989–2008. Observation wellsvary in depth and also the length and depth of the screening isvariable. Although some local confined conditions controlled bythe Boom Formation are observed in the study area, the observeddataset D accounted for phreatic conditions solely. This mightlower the information content of the dataset D to effectively dis-criminate between models. A limited set of head observations,however, may often be the only information available about thesystem dynamics to perform a modelling exercise and/or modeldiscrimination. From preliminary runs a departure of ±5 m fromthe observed head in each observation well is defined as rejectioncriterion. That is, if hobs � 5 m < hsim < hobs + 5 m a Gaussian likeli-hood measure is calculated, otherwise the likelihood is zero. Thisrejection criterion is defined in order to achieve enough parame-ter samples for the exploration of the posterior probability spaceand to ensure convergence of the different Markov Chains used inthe M–H algorithm. For details about the implementation of therejection criterion in the frame of the proposed approach thereader is referred to Rojas et al. (2008).

Five parallel chains, starting from randomly selected points de-fined in the prior parameter ranges (Table 3), are implemented toproceed with the M–H algorithm for each conceptual model.Four-, six-, and eight-dimensional uniform distributions with ini-tial prior ranges defined in Table 3 are defined as the proposal dis-tributions for M1, M2 and M3, respectively. For each proposed setof parameters a new Gaussian likelihood value is calculated infunction of the agreement between observed and simulatedgroundwater heads at the 51 observation wells depicted in Fig. 1.

Using the discrete samples from the M–H algorithm the inte-grated likelihood of each conceptual model, p(DjMk) in Eq. (2), isapproximated by summing over all the retained likelihood valuesfor Mk. The posterior model probabilities are then obtained by nor-malizing over the whole ensemble M under average recharge con-ditions ðRÞ.

For each series of predicted variables of interest, e.g. river lossesand river gains from the Velp and Demer, drain outflows andgroundwater inflows and outflows from the Walenbos area, acumulative predictive distribution, p(DjD,Mk,Si), is approximatedby normalizing the retained likelihood values for each conceptualmodel under each scenario such that they sum up to one.

The leading moments of the full BMA predictive distributionaccounting for parameter, conceptual model and scenario uncer-tainties are then obtained using Eqs. (3) and (4).

5. Results and discussion

5.1. Validation of the M–H algorithm results

The proposed methodology mainly worked by sampling newparameter sets for each proposed conceptual model following an

Page 9: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

100 101 102 103 104

Drain Cond. [m2 d-1]

0

4

8

12

16

Lik

elih

ood

x10-2

100 101 102 103 104

Velp Cond. [m2 d-1]

0

4

8

12

16

Lik

elih

ood

x10-2

100 101 102 103 104

Demer Cond. [m2 d-1]

0

4

8

12

16

Lik

elih

ood

x10-2

100 101

HK-1 [m d-1]

0

4

8

12

16

Lik

elih

ood

x10-2

a b

c d

Fig. 4. Marginal scatter plots of calculated likelihood using the M–H algorithm for parameters: (a) drain conductance, (b) conductance of the Velp River bed, (c) conductanceof the Demer River bed, (d) hydraulic conductivity layer 1 (HK-1) for model M1. Vertical lines represent solution obtained from calibration using least squares (UCODE-2005).Red diamond represents point of highest likelihood in the context of the GLUE methodology. (For interpretation of the references to colour in this figure legend, the reader isreferred to the web version of this article.).

424 R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435

M–H algorithm with the aim of obtaining posterior parameterprobability distributions. Several aspects of the implementationof the M–H algorithm such as the acceptance rate, the definitionof the burn-in samples, the proper mixing of alternative chainsand the convergence of the first two moments were checked to val-idate the results obtained using the improved methodology.

The average acceptance rates for the Markov Chains found formodels M1, M2 and M3 (for the 20000 parameter samples) were25%, 23% and 27%, respectively. All values lie in the ranges sug-gested in literature (Makowski et al., 2002). Although not shownhere, convergence of the first two moments for the posterior distri-butions of parameters obtained from the total discrete parametersample was also confirmed for the three alternative conceptualmodels. Therefore, the resulting discrete samples of parametersfrom models M1, M2 and M3 can be considered as a sample fromthe target posterior distributions under the respective conceptualmodel.

5.2. Likelihood response surfaces

From the proposed methodology, each parameter set was linkedto a likelihood value. The resulting marginal scatter plots of param-eter likelihoods for models M1 and M2 are shown in Figs. 4 and 5,respectively. Also, included in these figures are the results of aweighted least squares calibration using UCODE-2005 (Poeteret al., 2005). It is worth mentioning that several calibration trials(six for Model M1, ten for model M2 and more than twenty formodel M3) starting at different initial parameter values containedin the ranges defined in Table 3 were launched. For the sensitive

parameters all the calibration trials converged to rather similaroptimum parameter values, however, some minor differences wereobserved due to irregularities in the likelihood response surface.For insensitive parameters, on the other hand, different trials con-verged to different values. For the sake of clarity, only the final cal-ibrated parameter set is included in the comparison with theGLUE-BMA results.

From Figs. 4 and 5 it is seen that the likelihood values wererather insensitive to the conductance of drains and rivers (platesa, b and c in Figs. 4 and 5). High likelihood values were observedfor almost the whole prior sampling range being very difficult toidentify a well-defined attraction zone for these three parameters.This insensitivity was also reflected in the significant difference be-tween the values obtained using least squares calibration and thehighest likelihood points obtained in the context of the proposedmethod. Clearly, least squares calibration did not succeed inidentifying the point and/or even the range where the highestlikelihood values for these parameters were observed. This is awell-known drawback of least squares calibration methods in thepresence of highly insensitive parameters.

For parameters defining the mean hydraulic conductivity foreach model layer, on the contrary, well-defined attraction zoneswere identified by the proposed methodology (plates d in Fig. 4and plates d, e and f in Fig. 5). For these parameters, results ob-tained from least squares calibration were almost identical to thehighest likelihood points identified in the frame of the proposedmethodology.

Although not shown here, the same patterns were observed formodel M3 for the case of the three insensitive parameters and the

Page 10: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

100 101 102 103 104

Drain Cond. [m2 d-1]

0

4

8

12

16

Lik

elih

ood

x10-2

100 101 102 103 104

Velp Cond. [m2 d-1]

0

4

8

12

16

Lik

elih

ood

x10-2

100 101 102 103 104

Demer Cond. [m2 d-1]

0

4

8

12

16

Lik

elih

ood

x10-2

10-1 100 101

HK-1 [m d-1]

0

4

8

12

16

Lik

elih

ood

x10-2

10-1 100 101

HK-2 [m d-1]

0

4

8

12

16

Lik

elih

ood

x10-2

10-1 100 101

HK-3 [m d-1]

0

4

8

12

16

Lik

elih

ood

x10-2

a b c

d e f

Fig. 5. Marginal scatter plots of calculated likelihood using the M–H algorithm for parameters: (a) drain conductance, (b) conductance of the Velp River bed, (c) conductanceof the Demer River bed, (d) hydraulic conductivity layer 1 (HK-1), (e) hydraulic conductivity layer 2 (HK-2), and (f) hydraulic conductivity layer 3 (HK-3) for model M2.Vertical lines represent solution obtained from calibration using least squares (UCODE-2005). Red diamond represents point of highest likelihood in the context of the GLUEmethodology. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.).

10-7 10-5 10-3 10-1 101

HK-2 [m d-1]

0

4

8

12

16

Lik

elih

oo

x10-2

10-2 10-1 100 101

HK-3 [m d-1]

0

4

8

12

16

Lik

elih

ood

x10-2

a b

Fig. 6. Marginal scatter plots of calculated likelihood using the M–H algorithm for parameters: (a) hydraulic conductivity layer 2 (HK-2) and (b) hydraulic conductivity layer 3(HK-3) for model M3. Vertical lines represent solution obtained from calibration using least squares (UCODE-2005). Red diamond represents point of highest likelihood in thecontext of the GLUE methodology. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435 425

parameters defining the mean hydraulic conductivity for layer 1(HK-1), layer 4 (HK-4) and layer 5 (HK-5) (Table 2). For these lastthree parameters, well-defined attraction zones were identifiedand results of least squares calibration were fairly similar to thehighest likelihood points identified in the frame of the proposedmethod. However, two exceptions are worth mentioning. InFig. 6 the marginal scatter plots of calculated likelihood for thehydraulic conductivity of layer 2 (HK-2) (Fig. 6a) and layer 3(HK-3) (Fig. 6b) for model M3, are shown. These layers correspondto the Boom Formation and Ruisbroek Formation, respectively(Table 2). These scatter plots show that likelihood values are fairlyinsensitive for parameter HK-3. However, a clear attraction zonefor values greater than 0.001 m d�1 is observed for both parame-ters. This contrasts with the results obtained using the leastsquares calibration method. The most severe difference is for thecase of the parameter HK-2 (Fig. 6a) where the highest likelihoodpoint identified with the GLUE-BMA methodology and the result

from the least squares calibration differed by more than six ordersof magnitude. It is worth mentioning that convergence of UCODE-2005 was highly sensitive to the initial values of HK-2 and HK-3.After a significant number of trials, meaningful initial parametervalues for HK-2 and HK-3 were set to 0.01 m d�1 and 4.6 m d�1,respectively. These initial values allowed for convergence ofUCODE-2005, however, they produce rather dissimilar calibratedvalues compared to the highest likelihood points obtained withthe GLUE-BMA methodology. On the contrary, for the case of theproposed methodology the parameters were sampled from theprior range defined in Table 3 following the acceptance/rejectionrule described in step #6 of Section 2.3. Therefore, this procedureallowed identifying clear zones of attraction for these two param-eters although their insensitivity remained observed.

This critical difference in both approaches (GLUE-BMA and WLScalibration) may be explained by the meaning and the type ofinformation conveyed by the dataset D in this application. For

Page 11: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

426 R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435

pragmatic reasons, the dataset D did not include observation wellslocated in local confined aquifers distributed over the study areasince the interest was on the general functioning of the aquifer sys-tem and not on local conditions. In general, these local confinedaquifers are controlled by the presence of the hydrostratigraphicunit defined by the Boom Formation and, thus, by parameter HK-2. Therefore, if the dataset D, which corresponds to head measure-ments in the phreatic aquifer solely, does not contain any relevantinformation on confined areas it is difficult to account for the rel-evance and the actual value of parameter HK-2. As a consequence,parameter HK-2 becomes redundant and the zone of attraction de-fined in Fig. 6a is defined for an ‘‘equivalent” parameter HK-2accounting only for a phreatic system. This situation was easilyassimilated by the GLUE-BMA methodology whereas the WLSmethod faced convergence problems since initial values for param-eter HK-2 were defined in the observed range for the hydraulicconductivity values of the Boom Formation.

Despite these differences between WLS and GLUE-BMA, bothmethods performed equally well in terms of model performance.As an example, the root mean squared error (RMSE) for modelM1 using WLS and GLUE-BMA was 1.884 and 1.876, respectively.For model M2 both WLS and GLUE-BMA gave an RMSE of 1.890whereas for model M3 the RMSE of WLS and GLUE-BMA was1.761 and 1.741, respectively.

5.3. Posterior model probabilities

Table 4 presents the posterior model probabilities obtainedusing Eq. (2) for average recharge conditions as a result of the pro-posed methodology. It is seen from this table that the integratedmodel likelihoods for models M1, M2 and M3 differ slightly. As aconsequence and since posterior model probabilities are propor-tional to the integrated model likelihoods when prior model prob-abilities are set equal (i.e., when a priori there is no clearpreference for a given conceptual model), posterior model proba-bilities also differ marginally.

For this case, information provided by the observed dataset D(in the process of updating the prior model probabilities) doesnot allow discriminating significantly between models once Dhas been observed. This suggests that, for the problem at handand for the level of information content of D, prior model probabil-ities will likely play a significant role in determining the posteriormodel probabilities. In this regard, prior model probabilities couldbe thought of as ‘‘prior knowledge” about the alternative concep-tual models. This prior knowledge is ideally based on expert judge-ment, which Bredehoeft (2005) considers the basis for conceptualmodel development. In this way, expert ‘‘subjective” prior knowl-edge about optional conceptualizations in combination with theinformation provided by the dataset D, may allow some degreeof discrimination between models through updated posterior mod-el probabilities. As shown in Ye et al. (2008b), however, even forthe case when an expert assigns substantially different prior modelprobabilities, aggregating the prior model probabilities values fromseveral authors gives a relatively uniform prior model probabilitydistribution. It would be interesting to investigate the joint effect

Table 4Integrated model likelihoods, prior model probabilities and posterior model proba-bilities obtained for averages recharge conditions (scenario S2) for alternativeconceptual models M1, M2 and M3.

Conceptual models

M1 M2 M3

Integrated model likelihood p(DjMk) 2210.5 1966.5 2058.1Prior model probability p(Mk) (1/3) (1/3) (1/3)Posterior model probability p(MkjD) 0.355 0.315 0.330

of data and expert judgement on the prior model probabilities.For a complete analysis on the sensitivity of the results of the pro-posed methodology to different prior model probabilities, which isbeyond the scope of this article, the reader is referred to Rojas et al.(2009).

Another possible strategy is to increase the information contentof D by collecting new data that may be particularly useful in dis-criminating between models (e.g. river discharges, tracer traveltimes and observed groundwater flows). With extra data, the levelof ‘‘conditioning” of the results is increased and the integratedmodel likelihoods will differ for alternative conceptual models. Inpractice, however, a set of observed groundwater heads may oftenbe the only information available about the system dynamics toestimate posterior model probabilities for a set of alternative mod-el conceptualizations. This clearly put the challenge of assigningmodel weights (i.e. posterior model probabilities) considering of-ten a minimum level of information.

5.4. Groundwater model predictions accounting for conceptual modeland scenario uncertainties

Using the posterior model probabilities obtained in Table 4 foraverage recharge conditions and the cumulative predictive distri-butions obtained for each model, a multimodel cumulative predic-tive distribution is obtained for scenarios S1, S2 and S3.

Fig. 7 shows the cumulative predictive distributions for a seriesof groundwater budget components and the combined BMA pre-diction accounting only for conceptual model uncertainty for sce-nario S2. From this figure it is seen that, although posteriormodel probabilities differ slightly (Table 4), indicating a low infor-mation content of the dataset D, there are significant differences inthe predictions of models M1, M2 and M3. For river losses and rivergains from the Demer (plates a and b in Fig. 7) and Walenbos out-flows and inflows (plates f and g in Fig. 7), both the most likely pre-dicted values (P50) and the 95% (P2.5–P97.5) prediction intervalsdrastically differ between alternative conceptual models. This indi-cates that conceptual model uncertainty considerably dominatesboth the most likely predictions and the predictive uncertainty un-der S2. On the other hand, the most likely predicted values for riverlosses and river gains from the Velp (plates c and d in Fig. 7) anddrain outflows (plate e in Fig. 7) are rather similar, yet the 95% pre-diction intervals span clearly different ranges. This indicates thatalthough the most likely predicted values for models M1, M2 andM3 are quite similar, their predictive uncertainty is largely domi-nated by conceptual model uncertainty.

Additionally, Fig. 8 summarizes the most likely predicted valuesand the 95% predictive intervals for models M1, M2 and M3, underscenarios S1, S2 and S3 for the same groundwater budget compo-nents described in Fig. 7. This figure shows that for scenarios S1and S3, uncertainties due to the specification of alternative concep-tual models also play an important role. Conceptual model uncer-tainty is more relevant (under S1) for river gains and river lossesfrom the Demer and the Velp (plates a–d) and, to a lesser extent,for drain outflows (plate e) and inflows to the Walenbos area (plateg). This is explained by the fact that during low recharge conditions(S1) rivers contribute more water to the groundwater system dueto lower simulated groundwater heads in the neighbouring areas.This lowering in heads also explains why the drain outflows areonly slightly affected by the conceptual model uncertainty. For sce-nario S3 all predictive intervals for the groundwater budget com-ponents are affected by the selection of an alternative conceptualmodel. This is expected as for high recharge conditions (S3) it islikely that all groundwater flow components will be affected byan alternative conceptualization.

If groundwater budget components are transversely analysed itis seen that predictive intervals for river losses (Fig. 8a and c) are

Page 12: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

0

0.2

0.4

0.6

0.8

1

prob

[-]

Demer losses [m3 d-1]

x103

M1

0

0.2

0.4

0.6

0.8

1

prob

[-]

Demer gains [m3 d-1]

x103

M2

0

0.2

0.4

0.6

0.8

1

prob

[-]

Velp losses [m3 d-1]

x103

M3

0

0.2

0.4

0.6

0.8

1

prob

[-]

Velp gains [m3 d-1]

x103

BMA scenario S2

0

0.2

0.4

0.6

0.8

1

prob

[-]

Drain outflows [m3 d-1]

x103

0

0.2

0.4

0.6

0.8

1

prob

[-]

Walenbos outflows [m3 d-1]

x103

0

0.2

0.4

0.6

0.8

1

prob

[-]

0 1 2 3 4 5 0 2 4 6 8 10 0 1 2 3

0 2 4 6 8 92 94 96 98 100 102 0 0.4 0.8 1.2

3 4 5 6

Walenbos inflows [m3 d-1]

x103

a b c

d e

g

f

Fig. 7. Cumulative predictive distributions for groundwater budget components for alternative conceptual models M1, M2, M3 and the BMA cumulative predictionaccounting exclusively for conceptual model uncertainty under scenario S2.

0

2

4

6

8

10

Dem

er g

ains

x103

0

2

4

6

8

10

Dem

er lo

sses

x103

44

5094

104

Dra

in o

utfl

ows

150

160 x103

0

2

4

6

8

10

Wal

enbo

s ou

tflo

ws

x103

0

2

4

6

8

10

Wal

enbo

s in

flow

s

x103

Model M1

Model M2

Model M3

0

2

4

6

8

10

Vel

p lo

sses

x103

S1 S2 S3S1 S2 S3

S1 S2 S3 S1 S2 S3 S1 S2 S3

S1 S2 S3 S1 S2 S30

2

4

6

8

10

Vel

p ga

ins

x103a b c

e f g

d

Fig. 8. Prediction intervals (95%) and most likely predicted values for different groundwater flow components based on the cumulative predictive distributions obtained fromthe GLUE-BMA method. Values are shown for scenarios S1, S2 and S3 and for conceptual models M1, M2 and M3. All values expressed in m3 d�1.

R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435 427

dominated by scenario S1 whereas predictive intervals for riversgains (Fig. 8b and d) are dominated by scenario S3. For the drainoutflows and groundwater inflows and outflows from the Walen-bos area, scenario S3 shows the largest predictive intervals.

Each BMA cumulative distribution accounting for the alterna-tive conceptual models is combined under each scenario (e.g.Fig. 7 shows the case for S2 only). Subsequently, each scenarioprediction is combined following Eq. (1) to obtain a full BMA

Page 13: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

Table 5Prediction intervals (95%) and most likely predicted values based on the full BMAcumulative predictive distribution obtained from the GLUE-BMA methodology fordifferent groundwater budget components. All values expressed in m3 d�1.

P2.5 P50 P97.5

Demer losses 143 1865 4309Demer gains 1690 6957 9420Velp losses 129 843 2280Velp gains 1180 3377 6059Drain outflows 45,334 97,775 156,344Walenbos outflows 319 739 1186Walenbos inflows 2517 4675 6161

428 R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435

prediction accounting for conceptual model and scenario uncer-tainties. Fig. 9 shows the results for the full BMA prediction. Fromthis figure it is seen that the most likely predicted values obtainedwith the full BMA predictive distribution are rather similar to theresults obtained with scenario S2. This suggests that the main im-pact of including S1 and S3 is in the estimation of the predictiveuncertainty rather than in the estimation of the most likely pre-dicted value. This is evident for the case of drain outflows(Fig. 9e) where the P50 for the full BMA and S2 are practically iden-tical while the predictive intervals completely span differentranges. This suggests that for the drain outflows, scenario uncer-tainties will represent the main contribution to the predictive var-iance. The most likely predicted values and the 95% predictionintervals for the full BMA predictive distribution are summarizedin Table 5.

5.5. Contribution to predictive variance

As presented in Eq. (4), the predictive variance can besubdivided into three sources, namely, (I) within-models andwithin-scenarios (forcing data + parameters uncertainty), (II)between-models and within-scenarios (conceptual model uncer-tainty) and, (III) between-scenarios (scenario uncertainty). Fig. 10shows the predictive variance for the groundwater budget compo-nents described in previous paragraphs, where each source of var-iance is expressed as a percentage of the predictive variance.Within-models contribution is more significant for river lossesfrom the Velp (67%) and river gains from the Demer (66%). The con-tribution attributed to between-models is more important for the

0

0.2

0.4

0.6

0.8

1

prob

[-]

Demer losses [m3 d-1]

x103

0

0.2

0.4

0.6

0.8

1

prob

[-]

Demer g

0

0.2

0.4

0.6

0.8

1

prob

[-]

Velp gains [m3 d-1]

x103

0

0.2

0.4

0.6

0.8

1

prob

[-]

Drain out

0

0.2

0.4

0.6

0.8

1

prob

[-]

0 1 2 3 4 5 0 4

0 2 4 6 8 0 4

0 2Walenbos i

a

d

Fig. 9. BMA cumulative predictive distributions for groundwater budget componentsaccounting for conceptual model and scenario uncertainties.

groundwater outflows from Walenbos (75%) and for river lossesfrom the Demer (69%). Between-scenarios contributes up to ca.100% of the predictive variance for the drain outflows and up to78% for the groundwater inflows to the Walenbos area.

These results clearly show that considering fairly reasonableand observable recharge conditions have a considerable impacton the estimations of the predictive variance. However, due tothe fact that future scenarios are driven by unpredictable futureconditions, it is particularly difficult to implement suitable strate-gies aiming to diminish their contribution to the predictive vari-ance. On the contrary, when alternative scenarios are linked tofully or partially known future conditions, e.g., groundwaterabstraction scenarios (Rojas and Dassargues, 2007), prior scenarioprobabilities could be defined based on expert judgement or fol-lowing a similar approach to that described in Ye et al. (2008b).In the case of within- and between-models variance it is likely that

BMA S2

ains [m3 d-1]

x103

BMA S1

0

0.2

0.4

0.6

0.8

1

prob

[-]

Velp losses [m3 d-1]

x103

BMA S3

Full BMA

flows [m3 d-1]

x104

0

0.2

0.4

0.6

0.8

1

prob

[-]

Walenbos outflows [m3 d-1]

x103

8 12 0 1 2 3 4

8 12 16 0 0.4 0.8 1.2 1.6

4 6 8nflows [m3 d-1]

x103

b c

e

g

f

for alternative scenarios S1, S2 and S3 and the Full BMA cumulative prediction

Page 14: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

Demer (L)

Demer (G)

Velp (L)

Velp (G)

Drain (O)

Walenbos (O)

Walenbos (I)

0 20 40 60 80 100

Variance [%]

Within-models & within-scenarios

Between-models & within-scenarios

Between-scenarios

69

25

5

1

75

5

26

66

67

44

22

5

9

28

55

3

7817

Fig. 10. Sources of variance expressed as a percentage of the predictive variancecalculated using Eq. (4) for groundwater flow components (L stands for losses, Gstands for gains, I stands for inflows and O stands for outflows).

Table 6Summary of posterior model probabilities for different model selection criteria formodels M1, M2 and M3.

Conceptual models

M1 M2 M3

Nr. of observations 51 51 51SWSR1 180.95 182.18 158.18MLOFO2 64.59 64.93 57.73lnjFj3 �122.75 �117.88 �102.18p(Mk) 1/3 1/3 1/3

AIC 74.59 78.93 75.73Rank 1 3 2p(MkjD) 0.596 0.068 0.337

AICc 75.92 81.54 80.12Rank 1 3 2p(MkjD) 0.845 0.051 0.104

BIC 84.25 92.46 93.11Rank 1 2 3p(MkjD) 0.972 0.016 0.012

KIC �5.99 �6.68 �10.48Rank 3 2 1p(MkjD) 0.085 0.119 0.796

1 SWSR: Sum of weighted squared residuals.2 MLOFO: Maximum likelihood objective function observations obtained fromUCODE-2005 (Poeter et al., 2005).3 lnjFj: Natural log of the determinant of the Fisher Matrix.

R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435 429

new collected information/data may help in decreasing their corre-sponding uncertainty contributions. For the within-models vari-ance, it would be particularly interesting to collect data on theriver dynamics to aim decreasing the uncertainties in model pre-dictions for the river gains and losses in the Demer and Velp,respectively. As for the case of between-models variance, newinformation/data on river dynamics together with a better under-standing of the groundwater flow dynamics in the Walenbos areawould be helpful in decreasing the contribution of conceptualmodel uncertainty to the predictive variance.

5.6. Criteria-based multimodel methodologies

Alternatively, models M1, M2 and M3 were calibrated using aweighted least squares (WLS) method included in UCODE-2005(Poeter et al., 2005). Parametric uncertainty for each model was as-sessed using Monte Carlo simulation in a similar way to that de-scribed in Ye et al. (2006). Results of UCODE-2005 were used toapproximate the posterior model probabilities using Eq. (5) for aseries of four model selection criteria, namely, AIC, AICc, BIC andKIC (see Section 2.5). These posterior model probabilities werethen used to estimate the full BMA prediction (Eq. (1)), its leadingmoments (Eqs. (3) and (4)) and the contribution to the predictivevariance in the same fashion as in the case of GLUE-BMA.

Table 6 summarizes the results of the least squares calibrationusing UCODE-2005. From this table it is seen that models M1,M2 and M3 are ranked differently depending on the model selec-tion criterion used. This is in full agreement with the results ob-tained by Ye et al. (2008a). Whereas AIC and AICc rank modelsidentically, posterior model probabilities obtained with Eq. (5)are rather different for these two criteria. In the case of BIC, mostof the posterior weight is assigned to model M1 (97%), indicatingthat models M2 and M3 will have just marginal contributions inthe estimation of the full BMA predictive distribution. Additionally,models M2 and M3 are ranked differently by BIC compared to AICand AICc. The reason for this is the fact that BIC penalizes moredrastically more complex models when the observation samplesize is larger than 9, i.e. Di, i > 9, thus, putting more importanceon parsimony. For KIC a completely different ranking is obtainedas a result. Using the latter, M3 is preferred over the other modelsaccounting for a posterior weight of ca. 80%. Remarkably, this rank-ing is completely opposite to the one obtained using AIC and AICc.Ye et al. (2008a) argue that the presence of the Fisher information

term strongly influences the results of KIC. This allows KIC some-times to prefer more complex models based not only on goodnessof fit and number of parameters but also on the quality of the avail-able dataset D. This property is not shared by AIC, AICc or BIC sincethe Fisher information term is not present in their definitions.Although Ye et al. (2008a) appear to have settled the controversyabout the use of alternative model selection criteria in the frameof multimodel methodologies, the use of different model selectioncriteria will rank differently alternative conceptual models and,consequently, alternative conceptualizations will be given differ-ent posterior model probabilities using the approximation ex-pressed in Eq. (5). In the framework of a multimodel approach,this is critical.

Results from Table 6 also confirm the nature of the dataset Dused to assess model performance. As discussed earlier, D ac-counted only for phreatic conditions (head measurements of localconfined areas were discarded) in order to assess the meso-scalegroundwater flows to the Walenbos Nature Reserve. Table 6 showsthat sum of weighted squared residuals (SWSR) of model M2 is lar-ger than that of M1, although model M2 has two more parameters.In addition, the calibrated values of HK-1 for models M1, M2 andM3 are rather similar 2.8 m d�1, 2.9 m d�1 and 2.6 m d�1, respec-tively (see e.g. Figs. 4 and 5). This is in agreement with the typeof information conveyed by the dataset D (phreatic/shallowgroundwater not affected by deep aquifers or local confinedconditions).

In addition, significant differences from the values obtained inTable 4 are observed. These differences are explained by the esti-mation method of the posterior model probabilities. Values re-ported in Table 4 are calculated from the summation ofindividual likelihood values obtained from sampling the full hyper-space dimensioned by model structures, forcing data (inputs) andparameter vectors. On the contrary, values reported in Table 6are approximated using an exponential-type formula (Eq. (5)).Thus, small fluctuations on the model selection criterion and, asa consequence, in the delta terms used in Eq. (5), will have a largeinfluence on the resulting posterior model weights.

Fig. 11 shows the full BMA predictive distributions for ground-water budget terms obtained from criteria-based multimodel

Page 15: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

0

0.2

0.4

0.6

0.8

1

prob

[-]

Demer losses [m3 d-1]

x103

AIC

0

0.2

0.4

0.6

0.8

1

prob

[-]

Demer gains [m3 d-1]

x103

AICc

0

0.2

0.4

0.6

0.8

1

prob

[-]

Velp losses [m3 d-1]

x103

BIC

0

0.2

0.4

0.6

0.8

1

prob

[-]

Velp gains [m3 d-1]

x103

KIC

0

0.2

0.4

0.6

0.8

1

prob

[-]

Drain outflows [m3 d-1]

x104

GLUE-BMA

0

0.2

0.4

0.6

0.8

1

prob

[-]

Walenbos outflows [m3 d-1]

x103

0

0.2

0.4

0.6

0.8

1

prob

[-]

0 2 4 6 8 0 10 20 30 0 1 2 3 4

0 4 8 12 16 0 4 8 12 16 0 0.4 0.8 1.2 1.6

0 2 4 6 8

Walenbos inflows [m3 d-1]

x103

a b c

d e

g

f

Fig. 11. Comparison of full BMA cumulative predictive distributions for groundwater budget components between criteria-based multimodel methodologies and the GLUE-BMA method.

Table 7Predictive variance estimated using posterior model probabilities based on alternative model selection criteria (AIC, AICc, BIC, KIC) and the GLUE-BMA methodology. All valuesexpressed in (m3 d�1)2.

AIC AICc BIC KIC GLUE-BMA

Demer losses 2.606 � 106 1.485 � 106 6.822 � 105 1.632 � 106 1.588 � 106

Demer gains 5.636 � 106 3.242 � 106 1.596 � 106 5.743 � 106 3.751 � 106

Velp losses 2.056 � 105 1.408 � 105 9.892 � 104 2.556 � 105 2.610 � 105

Velp gains 1.556 � 106 1.191 � 106 9.761 � 105 2.257 � 106 1.574 � 106

Drain outflows 1.924 � 109 1.905 � 109 1.898 � 109 1.961 � 109 1.912 � 109

Walenbos outflows 1.264 � 105 6.679 � 104 2.261 � 104 6.868 � 104 7.300 � 104

Walenbos inflows 1.393 � 106 1.258 � 106 1.157 � 106 9.235 � 105 1.151 � 105

430 R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435

methodologies and the GLUE-BMA methodology. As expected,BMA predictive distributions obtained with alternative modelselection criteria are somewhat different between them. Differ-ences in the most likely predictive values are, in general, the larg-est between the values obtained using KIC and BIC. This is expectedsince these two criteria assigned much of the posterior weights toindividual and completely opposite models; whereas BIC favoursM1, KIC prefers M3 (Table 6). This reaffirms the idea that relyingon a single conceptual model is likely to produce biased predic-tions. For the drain outflows differences between the most likelypredicted values obtained from alternative multimodel methodol-ogies are minimum.

The most significant impact of using alternative model selectioncriteria to approach posterior model probabilities is on the estima-

tion of the predictive variance and the corresponding contributionsfrom parameters, conceptual models and scenarios. Since contribu-tions to the predictive variances are weighted by the correspond-ing posterior model probabilities it is expected that these threecomponents will differ for results obtained using different modelselection criteria.

Table 7 summarizes the predictive variance obtained usingdifferent model selection criteria. From this table it is observedthat when the posterior weight of a given (and identical) concep-tual model increases, which is equivalent to select a single concep-tual model over the others, the values of the predictive variancedecrease. This is explained by the fact that conceptual modeluncertainty is neglected and, as a consequence, deviations fromthe average estimations as expressed by the second term of Eq.

Page 16: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435 431

(4) are not taken into account. For example, using AIC, AICc and BICmodel M1 is assigned a posterior weight of 0.596, 0.845 and 0.972,respectively, thus, showing an increasing preference for model M1.Considering the river losses from the Velp, the predictive variancesestimated using these posterior model probabilities correspond to2.1 � 105 (m3 d�1)2, 1.4 � 105 (m3 d�1)2 and 9.9 � 104 (m3 d�1)2,respectively. This reaffirms the idea that when a (single) concep-tual model is preferred over the others, an underestimation ofthe predictive uncertainty is obtained. This is in full agreementwith the results for a synthetic study case obtained by Rojaset al. (2009).

Additionally, Fig. 12 shows the predictive variance estimatedusing posterior model probabilities obtained from AIC (plateFig. 12a), AICc (plate Fig. 12b), BIC (plate Fig. 12c) and KIC (plateFig. 12d). The predictive variance has been subdivided per sourceof variance and expressed as a percentage of the total predictivevariance. It is worth noting that for the case of BIC, which assigned97% of the posterior weight to model (M1), thus, showing a consid-erably preference for M1, 36% of the predictive variance of thegroundwater outflows from Walenbos comes from conceptualmodel uncertainty whereas for the river losses from the Demer,this contribution reaches 20%. The same two groundwater budgetterms show the largest contributions of conceptual model uncer-tainty for AIC (plate Fig. 12a), AICc (plate Fig. 12b) and the GLUE-BMA method (Fig. 10). In the case of KIC (plate Fig. 12d), on thecontrary, river gains from the Demer and groundwater outflowsfrom Walenbos show the largest contribution of conceptual modeluncertainty. Although the patterns showing the largest contribu-tions of conceptual model uncertainty are rather similar for differ-ent model selection criteria, the values of these contributionssubstantially differed. For example, the contribution of conceptualmodel uncertainty to predictive variance for the Walenbos out-flows ranged between 36% and 85% whereas for the river lossesfrom the Demer the contribution varied between 20% and 76%. Thisclearly shows that using different model selection criteria may pro-duce misleading and conflicting results.

Demer (L)Demer (G)

Velp (L)Velp (G)

Drain (O)Walenbos (O)Walenbos (I)

0 20 40 60 80 100

Variance [%]

Within-models &within-scenarios

Between-models &within-scenarios

Between-scenarios

0 20 40 60 80 100

Variance [%]

Demer (L)Demer (G)

Velp (L)Velp (G)

Drain (O)Walenbos (O)Walenbos (I)

0 20 40 60 80 100 0 20 40 60 80 100

a b

dc

18

28

47

35

14

61

47

20

76

16

3

28

41

37

20

20

8

20

12

3

36

55

62

28

1

52

3

5436

5739

1874

52 5

6236

2215

25

26

69

91

12

90

10

4

8

43

2

63

11

12

43

76

4

76

6

6

21

60

1

60

76

66

32

5

85

328

8

7

Fig. 12. Sources of variance expressed as a percentage of the predictive variancecalculated using Eq. (4) for groundwater flow components for criteria-basedmultimodel methodologies: (a) AIC-based, (b) AICc-based, (c) BIC-based and (d)KIC-based (L stands for losses, G stands for gains, I stands for inflows and O standsfor outflows).

A comparison of the capture zones obtained using the cali-brated parameters from UCODE-2005 and the highest likelihoodpoints from GLUE-BMA (Fig. 13) illustrates a relevant point. Cap-ture zones are obtained with MODPATH (Pollock, 1994) using aforward particle tracking method from estimates of the averagelinear velocity using a constant effective porosity ne of 0.1. Thesevelocities are estimated from the simulated heads obtained withMODFLOW-2005. In general, the simulated flow fields, either ob-tained using calibrated parameters from UCODE-2005 or the setof parameters with the highest likelihood value from GLUE-BMA, are rather similar. The latter produces fairly similar capturezones between these approaches and between models M1, M2and M3 despite the fact that posterior model probabilities maysignificantly differ between models. This is explained by the factthat the dataset D used to calibrate alternative conceptual modelsis based on the same head observations (Fig. 1), consequently,predictions of any variable closely linked to (or contained in)the data used for calibration will have a relatively low contribu-tion of conceptual model uncertainty to the predictive variance.However, as it is seen from the previous results, predicted vari-ables not included in dataset used for calibration are likely tohave a significant contribution of conceptual model uncertainty.This is the case for variables as river gains and river losses fromthe Demer or the Velp. These results are in full agreement withHarrar et al. (2003), Højberg and Refsgaard (2005) and Troldborget al. (2007) whose results show that the relevance of conceptualmodel uncertainty increases when predicted variables are not in-cluded in the dataset used for calibration.

6. Summary and conclusions

In this work, we presented a multimodel approach to estimatethe contributions to the predictive uncertainty arising from thedefinition of alternative conceptual models and optional rechargeconditions. The proposed multimodel approach combines theGLUE and BMA methods and it is an improved version of the ap-proach originally developed by (Rojas et al., 2008). The improve-ment consisted in replacing the traditional Latin HypercubeSampling scheme of GLUE by a MCMC sampling scheme which, sig-nificantly, reduced computational times and increased the effi-ciency of the approach. We accounted for conceptual model andscenario (recharge) uncertainties in the modelling of severalgroundwater budget terms in the groundwater system of theWalenbos Nature Reserve in Belgium. For that, three conceptualmodels were proposed based on different levels of geologicalknowledge and two additional recharge settings accounting fordeviations from average recharge conditions were used.

The study area is a hydrogeologically particular setup with dee-ply incised valleys promoting the contact between alternatingaquifers and different hydrostratigraphic units. The fact that thewetness and the surface waters available at the Walenbos NatureReserve are due solely to groundwater discharges (see e.g. Batelaanet al., 1998) is of vital importance and make the studied area anecologically valued zone. Although we worked with relatively sim-ilar conceptual models, the predictive uncertainties in these essen-tial groundwater flows showed to be very important for theWalenbos area. Therefore, whether the impacts of the differencesbetween the alternative conceptual models are significant or notshould be seen in the context of the present application.

The main findings of this work can be summarized as follows:

1. The adopted approach is flexible since (i) there is no lim-itation in the number or complexity of conceptual modelsthat can be included, or to what degree input and param-eter uncertainty can be incorporated, (ii) quantitative or

Page 17: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

181000 184000 187000 190000East [m]

Walenbos NatureReserve

DemerRiver

VelpRiver f

181000 184000 187000 190000East [m]

Walenbos NatureReserve

DemerRiver

VelpRiver e

181000 184000 187000 190000East [m]

1680

0017

0000

1720

0017

4000

1760

0017

8000

1800

0018

2000

1840

0018

6000

1880

00N

orth

[m

]

Walenbos NatureReserve

DemerRiver

VelpRiver d

Walenbos NatureReserve

DemerRiver

VelpRiver c

Walenbos NatureReserve

DemerRiver

VelpRiver b

1680

0017

0000

1720

0017

4000

1760

0017

8000

1800

0018

2000

1840

0018

6000

1880

00N

orth

[m

]

Walenbos NatureReserve

DemerRiver

VelpRiver a

Fig. 13. Forward particle tracking defining the capture zone for steady-state (calibrated) results obtained from UCODE-2005 (first row) and highest likelihood point in GLUE-BMA (second row) for models M1 (a,d), M2 (b,e)and M3 (c,f).

432 R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435

qualitative information about the system can be used todistinguish between different simulators, (iii) the closenessbetween the predictions and system observations can bedefined in a variety of ways and (iv) likelihoods, modelprobabilities and predictive distributions can be easilyupdated when new information becomes available. By def-inition, the results of the proposed methodology are condi-tional on the nature and the number of memberscomprising the ensemble M. To guarantee a robust quanti-fication of uncertainty, members of M should cover ameaningful range of potential system representationswhile ensuring that they are different enough to reason-ably explore the prior model space. The latter may cause

the number of potential models be exceedingly large, thusrendering their inclusion in M infeasible. For practicalapplications, however, there is not guarantee that thealternative conceptualizations are completely independentor that members of M closely represent the ‘‘unknown”groundwater system. In that case, the alternative concep-tualizations most representative of the dynamics of thesystem should be selected to achieve a trade-off betweenthe applicability of the GLUE-BMA method and the robust-ness of the uncertainty assessment. In this context, thenature and number of the sampled members of M willdetermine the ‘‘quality” of the uncertainty assessment(see e.g. Neuman, 2003).

Page 18: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435 433

2. For this specific study site, a set of 51 head observations didnot allow a further discrimination between the three con-ceptual models proposed ending up in small differences inposterior model probabilities. This indicates that the infor-mation content of the head observations was rather limitedin its ability to differentiate among alternative conceptual-izations and that, for this case, the values of prior modelprobabilities may play an important role in the case theyare not all taken equal. These prior model probabilitiesshould be considered as the analyst’s prior perception aboutthe plausibility of the alternative conceptual models andcould potentially be used as penalizing terms to fulfil withthe principle of parsimony. In this context, the combinationof prior expert knowledge about the conceptual models andthe information given by the data will produce a better dis-tinction between alternative conceptualizations. As shownby Rojas et al. (2009), the inclusion of proper and correctprior knowledge about the alternative conceptualizationswill reduce the predictive uncertainty. In addition, comple-menting the information content of the head observationswith flow-related measurements will improve the discrimi-nation among the alternative conceptualizations (see e.g.Rojas et al., 2010b).

3. Despite the small differences in posterior model probabili-ties, predictive distributions showed to be considerablydifferent in shape, central moment and spread amongthe alternative conceptualizations and scenarios analysed.This reaffirms the idea that relying on a single conceptualmodel driven by a particular scenario, will likely producebiased and under-dispersive estimations of the predictiveuncertainty.

4. The contribution of conceptual model uncertainty variedbetween 1% and 75% of the predictive uncertainty dependingon the groundwater budget term. Additionally, the contribu-tion of scenario uncertainty varied between 5% and ca. 100%of the predictive uncertainty depending on the budget term.The relative contribution of conceptual model uncertaintyfor the different groundwater budget components providesuseful information for updating the model concept or guid-ing data collection to optimally reduce conceptual uncer-tainty. If there had been better data available (e.g. dynamicheads, discharge values, travel time or hydraulic conductiv-ity measurements), it is likely that a better discriminationbetween alternative conceptual models would have beenobtained. For scenario uncertainty contributions, on theother hand, useful information to reduce its contributionmay be difficult to collect due to unknown and unpredict-able future conditions. However, if future scenarios arelinked to potential groundwater abstraction policies (seee.g., Rojas and Dassargues, 2007), expert knowledge aboutthe scenarios, in the form of prior scenario probabilities,could be included to optimally reduce the contribution ofscenario uncertainty to the predictive uncertainty.

5. Critical differences between the GLUE-BMA approach and atraditional least squares calibration method were observed.The proposed approach successfully identified attractionzones (and the highest likelihood points) for all parameterswhich were contained within feasible and meaningfulranges. On the contrary, for relatively insensitive parametersacross the three alternative conceptual models, the leastsquares method did not succeed in locating the highest like-lihood point and, in the most critical case, the calibratedvalue was found outside the attraction zone defined by theproposed approach. This is due to equifinality and the factthat the dataset D did not contain enough information toidentify unique parameter values.

6. The use of different model selection criteria to approximateposterior model probabilities in the frame of a multimodelmethodology resulted in alternative conceptual modelsbeing ranked differently, in the calculation of dissimilarposterior model probabilities, in different estimations ofthe predictive uncertainty and in different estimations forthe corresponding contributions to the predictive uncer-tainty from conceptual models and optional scenarios. Inthe frame of a multimodel approach, these issues are criti-cal and can not be neglected. In addition, GLUE-based pos-terior model weights are more evenly distributed amongalternative conceptualization compared to model weightsobtained from criteria-based methodologies. This result isin full agreement with the findings of Singh et al. (2010)and Ye et al. (2010).

7. Interestingly, for the extreme case when a single model waspreferred over the others, a rather significant contribution ofconceptual model uncertainty (36%) to the predictive uncer-tainty was observed for the groundwater outflows from theWalenbos area. This clearly states that even for slight contri-butions from alternative models to the posterior weights, inthis case 3% from models M2 and M3, conceptual modeluncertainty may play an important role and can not beneglected.

8. Results obtained from criteria-based multimodel methodol-ogies reaffirms the idea that relying on predictions obtainedusing a single conceptual model is likely to produce biasedestimations of the predictive uncertainty. Additionally,results obtained from alternative model selection criteriamay be ambiguous in indicating the contributions of concep-tual model and scenario uncertainties producing seriousimplications in planning future data collection campaigns.

9. Results from the proposed methodology as wells as resultsfrom traditional parameter calibration show that the rele-vance of conceptual model uncertainty increases when pre-dicted variables are not included in the data used forcalibration. This result is in full agreement with the resultsof Harrar et al. (2003), Højberg and Refsgaard (2005) andTroldborg et al. (2007).

10. The results of this study strongly advocate the idea toaddress conceptual model uncertainty in the practice ofgroundwater modelling. Additionally, to account for unfore-seen future circumstances, including scenario uncertaintypermits to obtain more realistic and possibly, more reliableestimations of the predictive uncertainty. The use of a singlemodel may result in smaller uncertainty intervals, hence anincreased confidence in the model simulations, but is verylikely prone to statistical bias. Also, in the presence of con-ceptual model uncertainty, which per definition can not beexcluded, this gain in accuracy in the short-term may haveserious implications when the model is used for long-termpredictions in which the system is subject to new stresses.It is therefore advisable to explore a number of alternativeconceptual models and scenarios to obtain predictions thatare more realistic, hence, that are more likely to includethe unknown true system responses.

Acknowledgements

The first author thanks the Katholieke Universiteit Leuven(K.U.Leuven) for providing financial support in the framework ofPhD IRO-scholarships. We also wish to thank Roberta-Serena Bla-sone for helpful comments on implementing MCMC in the frameof the GLUE methodology. We also acknowledge the helpful com-ments of three anonymous reviewers.

Page 19: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

434 R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435

References

Ajami, N., Duan, Q., Sorooshian, S., 2007. An integrated hydrologic Bayesianmultimodel combination framework: confronting input, parameter, and modelstructural uncertainty in hydrologic prediction. Water Resources Research 43,W01403. doi:10.1029/2005WR004745.

Akaike, H., 1974. A new look at the statistical model identification. IEEETransactions on Automatic Control 19, 716–723.

Anderson, M., Woessner, W., 1992. Applied Groundwater Modelling – Simulation ofFlow and Advective Transport, first ed. Academic Press, San Diego, California.

Batelaan, O., De Smedt, F., 2004. SEEPAGE, a new MODFLOW DRAIN Package.Ground Water 42, 576–588.

Batelaan, O., De Smedt, F., 2007. GIS-based recharge estimation by couplingsurface–subsurface water balances. Journal of Hydrology 337, 337–355.

Batelaan, O., De Smedt, F., Otero Valle, M., Huybrechts, W., 1993. Development andapplication of a groundwater model integrated in the GIS GRASS. In: Kovar, K.,Nachtebel, H. (Eds.), Application of Geographic Information Systems inHydrology and Water Resources Management. IAHS Publ. No. 211, Vienna,Austria, pp. 581–590.

Batelaan, O., De Smedt, F., De Becker, P., Huybrechts, W., 1998. Characterization of aregional groundwater discharge area by combined analysis of hydrochemistry,remote sensing and groundwater modelling.. In: Dillon, P., Simmers, I. (Eds.),Shallow Groundwater Systems. International Contributions to Hydrogeology,vol. 18. A.A. Balkema, Rotterdam, pp. 75–86.

Batelaan, O., Meyus, Y., De Smedt, F., 2007. De grondwatervoeding van Vlaanderen.Water 28, 64–71.

Beven, K., 1993. Prophecy, reality and uncertainty in distributed hydrologicalmodelling. Advances in Water Resources 16, 41–51.

Beven, K., 2006. A manifesto for the equifinality thesis. Journal of Hydrology 320,18–36.

Beven, K., Binley, A., 1992. The future of distributed models: model calibration anduncertainty prediction. Hydrological Processes 6, 279–283.

Beven, K., Freer, J., 2001. Equifinality, data assimilation, and uncertainty estimationin mechanistic modelling of complex environmental systems using the GLUEmethodology. Journal of Hydrology 249, 11–29.

Binley, A., Beven, K., 2003. Vadose zone flow model uncertainty as conditioned ongeophysical data. Ground Water 41, 119–127.

Blasone, R.-S., Madsen, H., Rosbjerg, D., 2008a. Uncertainty assessment of integrateddistributed hydrological models using GLUE with Markov chain Monte Carlosampling. Journal of Hydrology 353, 18–32.

Blasone, R.-S., Vrugt, J., Madsen, H., Rosbjerg, D., Robinson, B., Zyvoloski, G., 2008b.Generalized likelihood uncertainty estimation (GLUE) using adaptive MarkovChain Monte Carlo sampling. Advances in Water Resources 31, 630–648.

Bredehoeft, J., 2003. From models to performance assessment: theconceptualization problem. Ground Water 41, 571–577.

Bredehoeft, J., 2005. The conceptualization model problem–surprise. HydrogeologyJournal 13, 37–46.

Brooks, S., Gelman, A., 1998. General methods for monitoring convergence of iterativesimulations. Journal of Computational and Graphical Statistics 7, 434–455.

Chib, S., Greenberg, E., 1995. Understanding the Metropolis–Hastings algorithm.The American Statistician 49, 327–335.

Cools, J., Meyus, Y., Woldeamlak, S., Batelaan, O., De Smedt, F., 2006. Large-scale GIS-based hydrogeological modeling of Flanders: a tool for groundwatermanagement. Environmental Geology 50, 1201–1209.

Cowles, M., Carlin, B., 1996. Markov chain Monte Carlo convergence diagnostics: acomparative review. Journal of the American Statistical Association 91, 883–904.

De Becker, P., Huybrechts, W., 1997. Het Walenbos – Ecohydrologische Atlas. Rep.No. 97/03, Instituut voor Natuurbehoud, Brussels, Belgium.

DOV, Databank Ondergrond Vlaanderen, 2008.Draper, D., 1995. Assessment and propagation of model uncertainty. Journal of the

Royal Statistical Society Series B 57, 45–97.Feyen, L., Beven, K., De Smedt, F., Freer, J., 2001. Stochastic capture zone delineation

within the GLUE-methodology: conditioning on head observations. WaterResources Research 37, 625–638.

Gaganis, P., Smith, L., 2006. Evaluation of the uncertainty of groundwater modelpredictions associated with conceptual errors: a per-datum approach to modelcalibration. Advances in Water Resources 29, 503–514.

Gallagher, M., Doherty, J., 2007. Parameter estimation and uncertainty analysis for awatershed model. Environmental Modelling & Software 22, 1000–1020.

Gelman, A., Carlin, J., Stern, H., Rubin, D., 2004. Bayesian Data Analysis, second ed.Chapman & Hall, CRC, New York.

Geyer, C., 1992. Practical Markov chain Monte Carlo. Statistical Science 7, 473–483.Ghosh, J., Delampady, M., Samanta, T., 2006. An Introduction to Bayesian Analysis –

Theory and Methods, first ed. Springer, New York.Gilks, W., Richardson, S., Spiegelhalter, D., 1995. Markov Chain Monte Carlo in

Practice, first ed. Chapman & Hall/CRC, Boca Raton, Florida, USA.Gómez-Hernández, J., 2006. Complexity. Ground Water 44, 782–785.Gullentops, F., Bogemans, F., De Moor, G., Palissen, E., Pissart, A., 2001. Quaternary

lithostratigraphic units (Belgium). Geologica Belgica 4, 153–164.HAECON, and Witteveen+Bos, 2004. Ontwikkelen van regionale modellen ten

behoeve van het Vlaams Grondwater Model (VGM) in GMS/MODFLOW: PerceelNo. 3 Brulandkrijtmodel (Development of regional models for the FlemishGroundwater Model (VGM) in GMS/MODFLOW). Tech. rep., AMINAL, afdelingWATER, Brussels, Belgium.

Harbaugh, A., 2005. MODFLOW-2005, the US Geological Survey modular ground-water model-the Ground-Water Flow Process. Techniques and methods 6–a16,United States Geological Survey, Reston, Virginia, USA.

Harrar, W., Sonnenberg, T., Henriksen, H., 2003. Capture zone, travel time, andsolute transport predictions using inverse modelling and different geologicalmodels. Hydrogeology Journal 11, 536–548.

Hassan, A., 2004a. A methodology for validating numerical ground water models.Ground Water 42, 347–362.

Hassan, A., 2004b. Validation of numerical ground water models used to guidedecision making. Ground Water 42, 277–290.

Hastings, W., 1970. Monte Carlo sampling methods using Markov chains and theirapplications. Biometrika 57, 97–109.

Hill, M., 2006. The practical use of simplicity in developing ground water models.Ground Water 44, 775–781.

Hill, M., Tiedeman, C., 2007. Effective Groundwater Model Calibration: withAnalysis of Data, Sensitivities, Predictions and Uncertainty, first ed. JohnWiley & Sons, Inc., New Jersey.

Hoeting, J., 2002. Methodology for Bayesian model averaging: an update. In:International Biometric Conference. International Biometric Society, Freiburg,Germany, pp. 231–240.

Hoeting, J., Madigan, D., Raftery, A., Volinsky, C., 1999. Bayesian model averaging: atutorial. Statistical Science 14, 382–401.

Højberg, A., Refsgaard, J., 2005. Model uncertainty–parameter uncertainty versusconceptual models. Water Science & Technology 52, 177–186.

Hunt, R., Doherty, J., Tonkin, M., 2007. Are models too simple? Arguments forincreased parameterization. Ground Water 45, 254–262.

Hurvich, C., Tsai, C., 1989. Regression and time series model selection in smallsample. Biometrika 76, 297–307.

Ijiri, Y., Saegusa, H., Sawada, A., Ono, M., Watanabe, K., Karasaki, K., Doughty, C.,Shimo, M., Fumimura, K., 2009. Evaluation of uncertainties originating fromthe different modeling approaches applied to analyze regional groundwaterflow in the Tono area of Japan. Journal of Contaminant Hydrology 103,168–181.

Jensen, J., 2003. Parameter and uncertainty estimation in groundwater modelling.Ph.D. thesis, Department of Civil Engineering, Aalborg University, Denmark,series Paper No. 23.

Kashyap, R., 1982. Optimal choice of AR and MA parts in autoregressive movingaverage models. IEEE Transactions on Pattern Analysis and Machine Intelligence42, 99–104.

Kass, R., Raftery, A., 1995. Bayes factors. Journal of the American StatisticalAssociation 90, 773–795.

Laga, P., Louwye, S., Geets, S., 2001. Paloegene and Neogene lithostratigraphie units(Belgium). Geologica Belgica 4, 135–152.

Makowski, D., Wallach, D., Tremblay, M., 2002. Using a Bayesian approach toparameter estimation; comparison of the GLUE and MCMC methods.Agronomie 22, 191–203.

McKay, D., Beckman, R., Conover, W., 1979. A comparison of three methods forselecting values of input variables in the analysis of output from a computercode. Technometrics 21, 239–245.

Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., 1953. Equation ofstate calculations by fast computing machines. The Journal of Chemical Physics21, 1087–1092.

Meyer, P., Ye, M., Neuman, S., Cantrell, K., 2004. Combined estimation ofhydrogeologic conceptual model and parameter uncertainty. Report nureg/cr-6843 pnnl-14534, US Nuclear Regulatory Commission, Washington, US.

Meyer, P., Ye, M., Rockhold, M., Neuman, S., Cantrell, K., 2007. Combined estimationof hydrogeologic conceptual model parameter and scenario uncertainty withapplication to uranium transport at the Hanford Site 300 area. Report nureg/cr-6940 pnnl-16396, US Nuclear Regulatory Commission, Washington, US.

Meyus, Y., Batelaan, O., De Smedt, F., 2000. Concept Vlaams Grondwater Model(VGM): Technicsh concept van het VGM; deelraport 1: Hydrogeologischecodering van de ondergrond van Vlaanderen (HCOV) (Technical concept of theFlemish groundwtaer model: Report 1: Hydrogeological coding of the subsoil ofFlanders) [In Dutch]. Tech. rep., AMINAL, afedeling WATER, Belgium.

Neuman, S., 2003. Maximum likelihood Bayesian averaging of uncertain modelpredictions. Stochastic Environmental Research and Risk Assessment 17, 291–305.

Neuman, S., Wierenga, P., 2003. A comprehensive strategy of hydrogeologicmodeling and uncertainty analysis for nuclear facilities and sites. Reportnureg/cr-6805, US Nuclear Regulatory Commission, Washington USA.

Poeter, E., Anderson, D., 2005. Multimodel ranking and inference in ground watermodelling. Ground Water 43, 597–605.

Poeter, E., Hill, M., Banta, E., Mehl, S., Christensen, S., 2005. UCODE-2005 and sixother computer codes for universal sensitivity analysis, calibration, anduncertainty evaluation, Technical methods 6–a11, United States GeologicalSurvey, Reston, Virginia, USA.

Pollock, D., 1994. User’s guide for MODPATH/MODPATH-PLOT, version 3: a particletracking post-processing package for MODFLOW, the US Geological Surveyfinite-difference ground-water flow model. Open-file report 94-464, UnitedStates Geological Survey, Reston, Virginia, USA.

Refsgaard, J., Van der Sluijs, J., Brown, J., Van der Keur, P., 2006. A framework fordealing with uncertainty due to model structure error. Advances in WaterResources 29, 1586–1597.

Refsgaard, J., Van der Sluijs, J., Højberg, A., Vanrolleghem, P., 2007. Uncertainty inthe environmental modelling process – a framework and guidance.Environmental Modelling & Software 22, 1543–1556.

Page 20: Application of a multimodel approach to account for conceptual model and scenario uncertainties in groundwater modelling

R. Rojas et al. / Journal of Hydrology 394 (2010) 416–435 435

Renard, P., 2007. Stochastic hydrogeology: what professionals really need? GroundWater 45, 531–541.

Robert, C., 2007. The Bayesian Choice – From Decision – Theoretic Foundations toComputational Implementation, second ed. Springer-Verlag, New York.

Rojas, R., Dassargues, A., 2007. Groundwater flow modelling of the regional aquiferof the Pampa del Tamarugal, northern Chile. Hydrogeology Journal 15, 537–551.

Rojas, R., Feyen, L., Dassargues, A., 2008. Conceptual model uncertainty ingroundwater modeling: combining generalized likelihood uncertaintyestimation and Bayesian model averaging. Water Resources Research 44,W12418. doi:10.1029/2008WR006908.

Rojas, R., Feyen, L., Dassargues, A., 2009. Sensitivity analysis of prior modelprobabilities and the value of prior knowledge in the assessment of conceptualmodel uncertainty in groundwater modelling. Hydrological Processes 23, 1131–1146.

Rojas, R., Batelaan, O., Feyen, L., Dassargues, A., 2010a. Assessment of conceptualmodel uncertainty for the regional aquifer Pampa del Tamarugal – North Chile.Hydrology and Earth System Sciences 14, 171–192.

Rojas, R., Feyen, L., Batelaan, O., Dassargues, A., 2010b. On the value of conditioningdata to reduce conceptual model uncertainty in groundwater modelling. WaterResources Research 46.

Romanowicz, R., Beven, K., Tawn, J., 1994. Evaluation of prediction uncertainty innon-linear hydrological models using a Bayesian approach. In: Barnett, V.,Trukman, F. (Eds.), Statistics for the Environment II – Water Related Issues. JohnWiley & Sons, Inc., Chichester.

Rubin, Y., 2003. Applied Stochastic Hydrogeology, first ed. Oxford University Press,New York.

Schwartz, G., 1978. Estimating the dimension of a model. Annals of Statistics 6,461–464.

Seifert, D., Sonnenberg, T., Scharling, P., Hinsby, K., 2008. Use of alternativeconceptual models to assess the impact of a buried valley on groundwatervulnerability. Hydrogeology Journal 16, 659–674.

Singh, A., Mishra, S., Ruskauff, G., 2010. Model averaging techniques for quantifyingconceptual model uncertainty. Ground Water 48, 701–715.

Sorensen, D., Gianola, D., 2002. Likelihood, Bayesian, and MCMC Methods inQuantitative Genetics, first ed. Springer-Verlag, New York. vol. I.

Tierney, L., 1994. Markov chains for exploring posterior distributions. The Annals ofStatistics 22, 1701–1728.

Troldborg, L., Refsgaard, J., Jensen, K., Engesgaard, P., 2007. The importance ofalternative conceptual models for simulation of concentrations in a multi-aquifer system. Hydrogeology Journal 15, 843–860.

Wasserman, L., 2000. Bayesian model selection and model averaging. Journal ofMathematical Psychology 44, 92–107.

Ye, M., Neuman, S., Meyer, P., 2004. Maximum likelihood Bayesian averaging ofspatial variability models in unsaturated fractured tuff. Water ResourcesResearch 40, W05113. doi:10.1029/2003WR002557.

Ye, M., Neuman, S., Meyer, P., Pohlmann, K., 2005. Sensitivity analysis andassessment of prior model probabilities in MLBMA with application tounsaturated fractured tuff. Water Resources Research 41, W12429.doi:10.1029/2005WR004260.

Ye, M., Pohlmann, K., Chapman, J., Shafer, D., 2006. On evaluation of recharge modeluncertainty: a priori and a posteriori. In: 2006 International High LevelRadioactive Waste Management Conference. American Nuclear Society, LasVegas, Nevada, US.

Ye, M., Meyer, P., Neuman, S., 2008a. On model selection criteria in multimodelanalysis. Water Resources Research 44, W03428. doi:10.1029/2008WR006803.

Ye, M., Pohlmann, K., Chapman, J., 2008b. Expert elicitation of recharge modelprobabilities for the Death Valley regional flow system. Journal of Hydrology354, 102–115.

Ye, M., Pohlman, K., Chapman, J., Pohll, G., Reeves, D., 2010. A model-averagingmethod for assessing groundwater conceptual model uncertainty. GroundWater 48, 716–728.