correcting errors in streamflow forecast ensemble mean and...

17
Correcting Errors in Streamflow Forecast Ensemble Mean and Spread ANDREW W. WOOD AND JOHN C. SCHAAKE* Department of Civil and Environmental Engineering, University of Washington, Seattle, Washington (Manuscript received 19 December 2006, in final form 23 July 2007) ABSTRACT When hydrological models are used for probabilistic streamflow forecasting in the Ensemble Streamflow Prediction (ESP) framework, the deterministic components of the approach can lead to errors in the estimation of forecast uncertainty, as represented by the spread of the forecast ensemble. One avenue for correcting the resulting forecast reliability errors is to calibrate the streamflow forecast ensemble to match observed error characteristics. This paper outlines and evaluates a method for forecast calibration as applied to seasonal streamflow prediction. The approach uses the correlation of forecast ensemble means with observations to generate a conditional forecast mean and spread that lie between the climatological mean and spread (when the forecast has no skill) and the raw forecast mean with zero spread (when the forecast is perfect). Retrospective forecasts of summer period runoff in the Feather River basin, California, are used to demonstrate that the approach improves upon the performance of traditional ESP forecasts by reducing errors in forecast mean and improving spread estimates, thereby increasing forecast reliability and skill. 1. Introduction Operational streamflow prediction for water re- sources management in the western United States de- pends on winter and spring forecasts of runoff volumes for the relatively dry late-spring and summer period. Target forecast periods vary for different end use ap- plications, and also regionally, with April–July common to the southern half of the domain and April–Septem- ber more common to the northern half (where the peak runoff, a response to melting snowpack, occurs later), although other periods are also used. The primary op- erational methods for seasonal streamflow forecasting are linear regression (e.g., Garen 1992) and Ensemble Streamflow Prediction (ESP), a technique based on hy- drologic modeling (Twedt et al. 1977). The former has been (for most of the last century) and is the standard approach (Wood and Lettenmaier 2006), but the latter method is rapidly being brought to the forefront, a shift enabled by advances in computing, digital data access, and model calibration approaches. ESP-based methods are also motivated by concerns that regression-based approaches may be unsuitable in the face of nonstation- arities associated with climate change and variability (e.g., Hamlet et al. 2005; Mote et al. 2005; Cayan et al. 2001; Pagano and Garen 2005). Although both ESP and regression can be applied in ways that emphasize parts of the training period deemed most relevant to current climate, the ESP’s model basis imposes physical con- straints that may help guide hydrologic responses in unfamiliar climate situations. The operational adoption of model-based ensemble forecasts is complicated, however, by a number of issues relating in general to hydrologic model uncertainty, which is often catego- rized into component uncertainties associated with pa- rameter estimation, meteorological forcings, or model structure—that is, whether the model physics are real- istic (Beven 1993; Sorooshian et al. 1993; Wagener et al. 2003). When a model is used for ESP streamflow fore- casting, these uncertainties typically are manifested as biases in the mean and spread of forecast ensembles. The ESP approach couples a deterministic (single valued) simulation of the hydrologic state during a model spinup period leading up to the forecast start date with an ensemble of historical sequences of me- teorological model inputs (e.g., temperature and pre- cipitation) that simulate weather in the future (or fore- * Additional affiliation: Office of Hydrologic Development, National Weather Service, Silver Spring, Maryland. Corresponding author address: Andrew W. Wood, Dept. of Civil and Environmental Engineering, University of Washington, Seattle, WA 98115. E-mail: [email protected] 132 JOURNAL OF HYDROMETEOROLOGY VOLUME 9 DOI: 10.1175/2007JHM862.1 © 2008 American Meteorological Society JHM862

Upload: others

Post on 27-Mar-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

Correcting Errors in Streamflow Forecast Ensemble Mean and Spread

ANDREW W. WOOD AND JOHN C. SCHAAKE*

Department of Civil and Environmental Engineering, University of Washington, Seattle, Washington

(Manuscript received 19 December 2006, in final form 23 July 2007)

ABSTRACT

When hydrological models are used for probabilistic streamflow forecasting in the Ensemble StreamflowPrediction (ESP) framework, the deterministic components of the approach can lead to errors in theestimation of forecast uncertainty, as represented by the spread of the forecast ensemble. One avenue forcorrecting the resulting forecast reliability errors is to calibrate the streamflow forecast ensemble to matchobserved error characteristics. This paper outlines and evaluates a method for forecast calibration as appliedto seasonal streamflow prediction. The approach uses the correlation of forecast ensemble means withobservations to generate a conditional forecast mean and spread that lie between the climatological meanand spread (when the forecast has no skill) and the raw forecast mean with zero spread (when the forecastis perfect). Retrospective forecasts of summer period runoff in the Feather River basin, California, are usedto demonstrate that the approach improves upon the performance of traditional ESP forecasts by reducingerrors in forecast mean and improving spread estimates, thereby increasing forecast reliability and skill.

1. Introduction

Operational streamflow prediction for water re-sources management in the western United States de-pends on winter and spring forecasts of runoff volumesfor the relatively dry late-spring and summer period.Target forecast periods vary for different end use ap-plications, and also regionally, with April–July commonto the southern half of the domain and April–Septem-ber more common to the northern half (where the peakrunoff, a response to melting snowpack, occurs later),although other periods are also used. The primary op-erational methods for seasonal streamflow forecastingare linear regression (e.g., Garen 1992) and EnsembleStreamflow Prediction (ESP), a technique based on hy-drologic modeling (Twedt et al. 1977). The former hasbeen (for most of the last century) and is the standardapproach (Wood and Lettenmaier 2006), but the lattermethod is rapidly being brought to the forefront, a shift

enabled by advances in computing, digital data access,and model calibration approaches. ESP-based methodsare also motivated by concerns that regression-basedapproaches may be unsuitable in the face of nonstation-arities associated with climate change and variability(e.g., Hamlet et al. 2005; Mote et al. 2005; Cayan et al.2001; Pagano and Garen 2005). Although both ESP andregression can be applied in ways that emphasize partsof the training period deemed most relevant to currentclimate, the ESP’s model basis imposes physical con-straints that may help guide hydrologic responses inunfamiliar climate situations. The operational adoptionof model-based ensemble forecasts is complicated,however, by a number of issues relating in general tohydrologic model uncertainty, which is often catego-rized into component uncertainties associated with pa-rameter estimation, meteorological forcings, or modelstructure—that is, whether the model physics are real-istic (Beven 1993; Sorooshian et al. 1993; Wagener et al.2003). When a model is used for ESP streamflow fore-casting, these uncertainties typically are manifested asbiases in the mean and spread of forecast ensembles.

The ESP approach couples a deterministic (singlevalued) simulation of the hydrologic state during amodel spinup period leading up to the forecast startdate with an ensemble of historical sequences of me-teorological model inputs (e.g., temperature and pre-cipitation) that simulate weather in the future (or fore-

* Additional affiliation: Office of Hydrologic Development,National Weather Service, Silver Spring, Maryland.

Corresponding author address: Andrew W. Wood, Dept. ofCivil and Environmental Engineering, University of Washington,Seattle, WA 98115.E-mail: [email protected]

132 J O U R N A L O F H Y D R O M E T E O R O L O G Y VOLUME 9

DOI: 10.1175/2007JHM862.1

© 2008 American Meteorological Society

JHM862

Page 2: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

cast) period. One strength of the ESP approach is thatit accounts for uncertainty in future climate, which insome seasons is the major component of forecast un-certainty, by assuming that historical climate variabilityis a good estimate of current climate uncertainty. Aweakness of the approach, however, is that when theuncertainty of the current (“initial”) hydrologic state isa significant component of the overall forecast uncer-tainty (e.g., during late spring in the western UnitedStates), the deterministic estimate of the forecast en-semble’s initial hydrologic state leads to an overconfi-dent forecast—that is, one having a spread that is nar-rower than the total forecast uncertainties warrant. Fig-ure 1 illustrates this problem using six ESP forecastensembles made on the first of each month from Janu-ary to June, for April–July streamflow volume. Particu-larly for “late season” forecasts, that is, those madeafter the start of the target period, the observation(thick line) is higher than the most extreme ensemblemember, and hence is not enclosed by the forecasts’uncertainty range. Simulation model deficiencies thatmay also contribute to this problem include errors inmodel structure or parameterization (also specified de-terministically) that could limit the ability of the modelto simulate a full range of hydrologic response (Krzysz-tofowicz 1999).

Late-season forecasts tend to expose model biasesmore readily than do the early-season forecasts becausefor the part of the target period that has passed, theyare partially driven by observed (rather than ensembleforecast) forcings, yielding a deterministic flow esti-mate that is averaged with the ensemble estimates that

span the future portion of the target period. The biaseswould be greatly alleviated if observed streamflowswere used for the deterministic (past) part of the fore-cast. Unfortunately, records of observed, naturalizedflow (i.e., unimpaired by human influences such as res-ervoir regulation or groundwater pumping) at many lo-cations are unavailable for months to years after theforecast date.

Efforts to reduce hydrologic simulation errors havebeen pursued in many areas since the inception of hy-drologic modeling. Work on characterization of param-eter error and its effects on simulation uncertainty isarguably the most extensive area [Wagener and Gupta(2005) is but one article from a rich literature] and hasyielded advances both in model calibration and in theunderstanding of the model calibration problem from atheoretical perspective. Multiple algorithm or modularstrategies, in which modelers can mix and match themethods used to simulate different components of thehydrologic cycle within one nominal model (e.g.,Leavesley et al. 2002), have been investigated to com-pensate for inadequacies of individual model physics.Models have been distributed over ever-finer subarearesolutions in an effort to account for terrain and cli-mate inhomogeneity (e.g., Boyle et al. 2001), and physi-cal algorithms have been used in place of conceptualones (Wigmosta et al. 1994, among others). Variousapproaches for improving model meteorological inputs,especially where distributed forcings must be derivedfrom point observations, have been explored (Listonand Elder 2006; Clark and Slater 2006). Data assimila-tion techniques using ancillary observations (both re-motely sensed and in situ) on the fly to adjust modelstate variables such as soil moisture and snow waterequivalent (SWE) are an area of active and promisingresearch (e.g., Andreadis and Lettenmaier 2006;Reichle et al. 2002; Vrugt et al. 2005; Seo et al. 2003).And where efforts to improve the model physics, cali-bration, or inputs fall short, relatively simple bias-correction procedures (Hashino et al. 2007) applied tothe model outputs have both been shown to reducestreamflow simulation errors. More elaborate statisticaltechniques for multimodel combination (merging out-puts from several models) are now also being tested,both on long-term simulations (Ajami et al. 2006) andshort lead flow forecasts. Vrugt et al. (2006) and Vrugtand Robinson (2007), for example, evaluate strategiesfor applying a Bayesian model averaging technique tocombine outputs from several models into forecaststhat have optimal statistical properties.

Bias correction and multimodel combination meth-ods involve the statistical postprocessing of dynamical

FIG. 1. ESP forecasts of April–July streamflow volume for theFeather River basin, made on the first of each month from Janu-ary to June 1995. The box-and-whisker symbols show the 10th,30th, median, 70th, and 90th percentiles of the raw forecast en-semble, and the points are ensemble members. MCM � millioncubic meters.

FEBRUARY 2008 W O O D A N D S C H A A K E 133

Page 3: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

model outputs for prediction purposes, a practice that ismore commonly found in the field of atmospheric sci-ence than in hydrology. Methods such as model outputstatistics (Glahn and Lowry 1972) have been used fordecades to derive forecasts of variables (such as surfaceprecipitation) that are not well simulated by numericalweather prediction (NWP) models from better-simu-lated NWP model variables, for example, pressurefields. One concept employed widely in NWP, but in-frequently in operational hydrologic prediction, is thecalibration of forecasts (Hamill and Collucci 1997;Hamill et al. 2004; Toth et al. 2005; Atger 2003), ratherthan of the dynamical models that produce them. Fore-cast calibration adjusts raw ensemble forecast prob-abilities to match their observed error characteristicsfrom a retrospective verification period, and differsfrom model calibration by addressing the symptoms ofprediction error rather than causes (e.g., physics or pa-rameter misspecification, forcing biases and so forth).The popularity of postprocessing adjustment ap-proaches in meteorology is likely due to the computa-tional intensiveness of atmospheric models and theirhigh dimensionality, which prohibit model parametercalibration schemes of the types traditionally found inhydrology.

Forecast calibration in hydrology, as in atmosphericscience, has focused mostly on short lead predictions. Inaddition to the work by Vrugt et al. (2006) and Vrugtand Robinson (2007), Krzysztofowicz and Herr (2001),for example, used forecast calibration to improve theperformance of river flood stage forecasts. Seo et al.(2006) applied an autoregressive-1 model to ESP-basedpredictions of streamflow at short (1–5 day) lead times,to correct both mean and probabilistic deficiencies inraw ESP forecast ensemble outputs. A drawback of theautoregressive calibration approach, however, is thatit depends on the availability of observed streamflowestimates immediately prior to the time of forecast,which, as discussed earlier, precludes its use for manyseasonal streamflow prediction applications. TheBayesian uncertainty processing example of Krzyszto-fowicz (1999) suggests using such observed antecedentrunoff as an input, but the general theory presented inthat work does not require it.

This paper considers the application of a forecastcalibration approach to the challenge of summerstreamflow prediction at monthly to seasonal leadtimes. Rather than assume the availability of prior ob-served flow estimates at the time of forecast, the pro-posed framework uses retrospective ensemble forecastof summer streamflow from the hydrologic predictionmodel to determine calibration adjustment parameters.

2. Approach

An evaluation of three statistical techniques for cor-recting raw ensemble forecast errors was conducted inthe Feather River basin of California using a hydrologymodel implementation that is described in section 2a.The first technique is a bias-correction approach (sec-tion 2b) that is applied to individual members of astreamflow forecast ensemble. The second technique isa calibration approach (section 2c) that operates on theensemble mean and generates adjusted mean andspread statistics, but not individual traces. The thirdtechnique, discussed in section 2d, is a flow-state-de-pendent variation of the second technique. The skilland accuracy metrics used to assess the performance ofthe statistical techniques are described in section 2e.

a. Hydrologic model implementation and ESPforecast generation

The Feather River drainage basin (an area of over7680 km2), ranges in elevation from over 2900 m toabout 275 m at Lake Oroville (Fig. 2). Annual precipi-tation varies from more than 3 m to less than 30 cm,with snowmelt runoff contributing approximately 40%of the annual average flow in the basin. Because almost60% of the basin’s elevation is below the mean snowline (1675 m), but above 600 m, winter streamflow isstrongly influenced by temperature variations (Koczotand Dettinger 2003). Currently, operational water sup-ply forecasts in the basin are regression-based estimates

FIG. 2. The Feather River, CA, drainage area for the ESP fore-casts, delineated against the 1/8 degree VIC hydrology model’smean gridcell elevations.

134 J O U R N A L O F H Y D R O M E T E O R O L O G Y VOLUME 9

Page 4: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

of April–July runoff that are disaggregated to monthlyvolumes using historical relationships and forecasterjudgment.

Streamflow for the Feather River at Oroville Reser-voir was simulated using the Variable Infiltration Ca-pacity (VIC) hydrologic model (Liang et al. 1994), ap-plied at daily time step (hourly for the embedded snowaccumulation and ablation model). For each grid cell inthe simulation domain, the simulation produces dailyestimates of surface runoff and baseflow, which arerouted through a grid-based stream network (using therouting scheme of Lohmann et al. 1998) to producestreamflow, also at a daily time step. In this study, theVIC simulation was calibrated and validated via a com-parison of monthly simulated streamflow with observed“full natural flow” (diversion and regulation effects re-

moved) taken from the California Data Exchange Cen-ter (CDEC; the Feather River site identification is“ORO”). The monthly streamflow simulations areshown in Fig. 3, and summary statistics are given inTable 1. The validation period simulation (1950–69)shows a slight decrease in the performance of somestatistics, for example, the ratio of simulated to ob-served mean and standard deviation of flow, relative tothe calibration period (1970–89), but the statistics forthe entire period used for the analysis, 1950–2005, de-scribe a model that accurately reproduces the observedinterannual flow variability, seasonal timing, andmonthly to annual volumes.

A simulation starting in January 1949 was used togenerate hydrologic states for forecast initialization onthe first day of the months January through June in

FIG. 3. The time series and the mean monthly hydrograph of monthly simulated andobserved flow for the Feather River at Oroville, CA.

FEBRUARY 2008 W O O D A N D S C H A A K E 135

Page 5: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

every year from 1950 to 2005, after a 1-yr spinup period.These states (containing, most importantly, soil mois-ture and snow variables) were used in turn to initializeESP forecasts based on a 30-yr climate ensemble period(1975–2004). For example, the initial hydrologic statefor 1 March was used to initialize 30 forecasts, the firstof which employed the temperature and precipitationforcings from 1 March 1975 to 29 February 1976. Al-though 1-yr lead forecast ensembles were initialized oneach of the six start times each year, each forecast en-semble member was used to calculate only a single pre-dictand—the total aggregate streamflow for the April–July period (used for water supply management). This56-yr set of ensemble forecasts for April–July flow forsix start dates are termed “raw” forecasts. The 10th,30th, 50th, 70th, and 90th percentiles of each ESP fore-cast are used here to summarize each ensemble fore-cast. These particular percentiles are featured in officialagency water supply forecasts, and for that reason areused here for evaluation. In an operational setting, pre-dictands other than the April–July flow and periods andmodel calibration strategies other than those chosenhere would no doubt be of interest; nonetheless, thislimited set of forecasts is sufficient for demonstrationpurposes.

b. Bias correction approach

A variety of approaches used in research and practiceare effective in correcting for systematic model simula-tion errors in a deterministic context, and these arenecessary when the outputs from the model are thenused in another model or analysis that requires unbi-ased inputs. A simple approach is to apply a constant orperhaps cyclic (monthly or seasonal) correction calcu-lated from the ratio of (or difference between) ob-served and simulated means. A more elaborate correc-tion is a “percentile mapping” approach (e.g., Wood etal. 2002; Wood and Lettenmaier 2006; Hashino et al.2007), in which the percentile (nonexceedence prob-

ability) of a simulated flow Fs(Q), where Fs is the cu-mulative distribution function (CDF) of the simulatedflow Q, is used to extract a corresponding flow Qbc

from the inverse CDF of the observations F�1o :

Qbc � Fo�1�Fs�Q��.

This correction adjusts both the mean and variance(and higher moments) of the simulation outputs tomatch those of the observed climatology. The simu-lated and observed CDFs are ideally based on the sameperiod. Either fitted or empirical CDFs may be used,but in the latter case a fitted distribution may still benecessary where either the simulated flows exceed thebounds of the simulated climatology, or their percen-tiles exceed the range of the observed inverse CDF.

In this study, the continuous simulation from 1950 to2005 described in section 2a defined the simulated cli-matology, and the CDFs for the simulated and ob-served April–July flows were used as the bias-correc-tion “mapping” distributions in the equation above.Each of the raw ESP ensemble forecast member values(April–July streamflow) were input to the equation asQ, generating bias-corrected forecast values Qbc thatmade up the forecast dataset termed “bias corrected.”

c. Approach for calibrating raw forecast ensembles

The common application of ESP forecasts is to usethe raw ensemble of hydrologic model flow outputs todetermine both the central tendency and the range ofuncertainty in the prediction. An alternate approach,using the algorithm applied in Schaake et al. (2007) toensemble temperature forecasts, is to view the raw fore-cast ensemble mean and uncertainty as biased and un-reliable, and instead to condition the forecast’s meanand spread on the raw forecast ensemble mean (as asingle-valued, deterministic forecast) and its correlationwith observations. A retrospective series of forecast en-semble means (each mean denoted by f ) is paired withcorresponding observations o and used to determinethe parameters of two relationships below that generatethe conditional forecast mean and spread of each cali-brated forecast. These parameters are �f and �f, thegrand mean and standard deviation of the retrospectiveseries of forecast ensemble means; �o and �o, the meanand standard deviation of the retrospective series ofmatching observations; and �of, the correlation betweenthe retrospective forecast ensemble means and obser-vations. Given these parameters and a new raw forecastensemble mean f, a calibrated forecast mean c can becalculated using

c � �o � �of��o��f �� f � �f �.

TABLE 1. Calibration and validation statistics for the simulatedmonthly flow of the Feather River at Oroville.

StatisticCalibration(1970–89)

Validation(1950–69)

All(1950–2005)

Simulated/obs avg flow 1.00 0.93 0.99Simulated/obs std flow 1.01 0.95 1.00Correlation 0.97 0.96 0.96Nash–Sutcliff efficiency 0.94 0.91 0.93Nash–Sutcliff efficiency

natlog flows0.88 0.88 0.89

RMSE/obs mean 0.26 0.29 0.28MSE/obs variation 0.06 0.09 0.07

136 J O U R N A L O F H Y D R O M E T E O R O L O G Y VOLUME 9

Page 6: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

This equation, incidentally, arises from the normalequation solution of a simple linear regression relatinga deterministic forecast and observation series, as in thestreamflow extension work of Hirsch (1982). Note thatif the correlation is zero, the forecast mean defaults tothe climatological mean of the retrospective observa-tions. The conditional forecast variance, �2

c, is given by

�c2 � �o

2�1 � �of2 �

and yields the observed climatological variance in thecase of zero correlation between the retrospective fore-cast means and observations—that is, a lack of any fore-cast skill. For a correlation of unity, the calibrated fore-cast spread equals zero, which would be justified if theforecast means indeed matched the observations per-fectly. Krzysztofowicz (1999) hold that this “coher-ence” property (i.e., the forecast defaulting to climatol-ogy in the absence of forecast skill) is essential for theuse of forecasts in rational decision making. Note thatthe correlations need not be positive to apply the con-ditioning equations. A negative correlation, indicatingthe perverse case of a forecast system that tends topredict the opposite of what will occur, would scale thecalibrated forecast anomaly in the opposite directionfrom the raw forecast anomaly.

This parametric approach for generating conditionalensemble statistics assumes that the forecasts and ob-servations are normally distributed, which is not thecase for most streamflow datasets. To meet this con-straint, the observed and forecast datasets can be trans-formed before calculating and applying the parametersof the conditioning equations. The percentiles of thecalibrated, transformed flows are then estimated in nor-mal space and later back-transformed to the originalflow unit scale.

The steps taken in calibrating one ensemble fore-cast—for example, an ensemble forecast of April–Julyflow made on 1 February in the current year—are sum-marized as follows:

1) A retrospective series of ensemble forecasts ofApril–July flow made on 1 February and the ob-served flows for a matching period must exist. Ifthese series are not normally distributed, they aretransformed via distribution fitting and applicationof a normal quantile transform, or other means. Thesubsequent steps are performed using the normal-ized values.

2) The mean of each retrospective ESP forecast en-semble is calculated.

3) The series of retrospective 1 February forecast en-semble means and the observations are used to cal-

culate the five parameters needed for the calibra-tion—�, �o, �f , �f , and �f.

4) The calibrated forecast mean and spread are calcu-lated using the conditioning equations and the meanof the current, transformed forecast ensemble, f.

5) Percentiles of interest of the current calibrated fore-cast are calculated using the conditional mean andstandard deviation, and transformed back to theflow domain.

The April–July streamflows and observations in thecase study of this paper are lognormally distributed, sothe natural logarithms of all forecast ensemble mem-bers (flow values) and observations are calculated asthe first step in the calibration process. The log flows,which are normally distributed, are used in steps 2–4above, and then the conditional means and variancesare used to generate log flow values at the five percen-tiles of interest. Finally, these five log flow values foreach calibrated forecast are exponentiated back to theflow domain. Scatterplots of the raw forecast ensemblemeans versus observations are shown in Fig. 4, withtheir correlation in log space (one of the parametersused in the conditioning equations) inset.

The bias-correction (section 2b) and ensemble cali-bration approaches can be combined in sequence, inwhich case the bias correction could be applied to theflows before any normalizing transformation, if onewere needed. The tables of section 3 include resultsfrom such a combination.

d. “Sliding window” state-dependent variation onensemble calibration approach

A variation in the forecast calibration approach wasalso investigated. The scatterplots (Fig. 4) suggest thatthe relationship between forecasts and observationsvaries depending on the state of the hydrologic system(note, e.g., the stronger tendency toward underpredic-tion for the high flows). If the correlation between fore-casts and observations depends on forecast state, itwould be desirable for the calibration approach toweight more heavily the raw information from the fore-cast in well-correlated system states than in poorly cor-related states. The calibration approach describedabove used all 56 of the retrospective ESP forecasts anddid not differentiate based on the state of the forecast.Dividing the retrospective forecast dataset into catego-ries, however, leads to insufficient sample sizes for cal-culating the necessary statistics—for example, in thiscase, a subsetting of the retrospective forecasts into ter-ciles would result in categories having fewer than 20members. Seo et al. (2006) found that sample sizes re-sulting from quartering a 47-yr retrospective forecast

FEBRUARY 2008 W O O D A N D S C H A A K E 137

Page 7: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

were insufficient for implementing their calibrationmethod, and instead resorted to halving the retrospec-tive forecast dataset in an effort to create a state-dependent calibration. Even with the larger samples,significant sampling uncertainties in the key calibrationparameter were noted. For further insight into the chal-lenges that state-dependent forecast performance of-fers to skill quantification efforts, and on the rationalefor state-dependent interpretation of forecasts, Kumar(2007) provides a thoughtful discussion.

To circumvent the sample size difficulty, a “slidingwindow” approach for generating state-dependent cali-bration parameters (correlation, means, and variances)is proposed. The retrospective forecasts from each yearare ranked from low to high flow state, with rank de-fined by the rank of the matching observations (April–July flow). Calibration statistics are then calculated foroverlapping subsets of varying sizes of the ranked ret-rospective ensemble forecasts and observations. For ex-ample, choosing a subset (sample) size of 30 allowsstatistics for 27 subsets to be calculated from the 56retrospective forecast years. The first of these subsetscontains the retrospective forecast years with the lowest30 observed flows, and would be suitable for calculatingcalibration parameters when a forecast ensemble mean

is very low, relative to forecast ensemble means fromother years.

A method for selecting which sliding-window subsetshould be used in calibrating a particular forecast isneeded. To this end, the subsets are ranked accordingto their mean observed (April–July) flow. The first sub-set has the lowest mean flow; the second subset, con-taining retrospective forecast years having the 2nd low-est observed flow to the 31st lowest, has the secondlowest mean flow; and so forth. The state of an ESPforecast can be defined by the rank of its ensemblemean relative to the forecast ensemble means of allretrospective ensemble forecasts. This rank can be ex-pressed as a percentile, and the subset having the near-est matching percentile, ranked among the other sub-sets, is selected for generating calibration parametersfor the particular forecast.

The sliding-window variation thus expands the sec-ond and third steps of the procedure described in sec-tion 2c. At the end of the second step, the percentile ofthe forecast ensemble mean is also calculated. The thirdstep then becomes: 3) The retrospective data series offorecast ensemble means and matching observationsare divided into overlapping subsets of a chosen size(e.g., 30 yr), and the subsets are ranked from low to

FIG. 4. April–July flow forecast ensemble means vs observations, based on retrospective forecasts from 1950 to2005, with the correlation (in natural log space) shown at bottom right.

138 J O U R N A L O F H Y D R O M E T E O R O L O G Y VOLUME 9

Page 8: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

high by the mean of the observed flow subset. Thepercentile of the current forecast ensemble mean flowis used to select the subset for generation of the fivecalibration parameters (listed previously).

In this study, calibrated forecasts based on subsetsizes of 20, 30, 35, 40, 45, and 50 were evaluated, andfrom these, the subset size of 35 was selected to illus-trate the approach. It is termed “Calib_35” to indicatethe size of the subset, and for consistency, the non-state-dependent calibration approach from section 2c istermed “Calib_56”—recognizing that it uses forecastsfrom all 56 retrospective years.

The variation in sample statistics for sample size ofN � 35 retrospective forecast years is shown in Fig. 5.For the early-season forecasts (January to March), theforecast system sample means and variances have alarge bias relative to the corresponding observedsample means and variance, and the correlation be-tween the forecasts and observations is low. For thesubsequent late-season forecasts, the forecast systemsample statistics increasingly match the observed

sample statistics, and the correlation increases. In allforecast months, the correlation is higher for more ex-treme system states than for states closer to the medianof the predictand’s distribution. Because the correla-tion determines the relative weighting of forecast signaland climatology in calibrating the raw ESP forecast,this indicates that the state-dependent calibration willrely more heavily on the raw ESP signal in extremeyears than in near-normal ones.

e. Evaluation metrics

The performance of the forecasts is measured usingthe Pearson correlation and the mean absolute error(MAE), both applied to the forecast distribution me-dian. A measure for evaluating the probabilistic fore-cast performance is also used: the continuous rankedprobability score (CRPS; Hersbach 2000). The CRPS ishere calculated by comparing the deciles of the forecastdistribution with the observations. The CRPS has theadvantage of being reported in the units of the forecast,and would equal the MAE if only one probability cat-

FIG. 5. April–July flow forecast sample statistics (in natural log space) for a sampling window of size N � 35, inwhich the 27 samples are ranked from low to high mean. The mean (left y axis) and std dev (Std, right y axis) areplotted for both observations and retrospective forecasts, and the correlation (Corr, right y axis) between obser-vations and retrospective forecasts is also shown.

FEBRUARY 2008 W O O D A N D S C H A A K E 139

Page 9: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

egory were used instead of the 10 chosen here (Hers-bach 2000; Weber et al. 2006). A skill score based onthe CRPS, the CRPSS (for CRP skill score), was alsocalculated using the formulation for skill scores given inWilks (1995),

CRPSS � �CRPSc � CRPSf ��CRPSc,

where CRPSf is the CRPS of the forecast distributionand CRPSc is calculated by interpreting the climatologi-cal distribution of the predictand (here the predictandis April–July streamflow) as a forecast. The skill scorehas positive values for forecasts having greater skillthan the climatology, with a maximum of 1 for a perfectforecast, and is unbounded below zero for forecasts lessskilled than the climatology.

In this demonstration, the entire retrospective fore-cast dataset is used for evaluation of the various meth-ods for forecast calibration, even though the resultswould arguably reflect added statistical rigor if thetraining and verification periods were separated. Tech-niques such as cross-validation, in which one to a fewyears are omitted in turns from training operations thatthey are then used to verify, are certainly applicable. Itis not likely that such efforts would alter the resultsgreatly, however, since the statistics used in the calibra-tion would not vary much from training sample to train-ing sample if only one or a few years were removedfrom each. In operational practice, many retrospectiveforecast years prior to the current forecast year wouldlikely be used for the training, and the difference intraining statistics with and without the current year(were it possible to include it) would be negligible un-less only a very short retrospective forecast period wereavailable.

3. Results

In general, forecast calibration increases the spreadof the probabilistic forecast, which decreases its resolu-tion and sharpness while increasing its reliability. Fig-ure 6 illustrates the effects of calibration and bias cor-rection using a representative sample of 6 (out of 56)retrospective forecasts for April–July flow made on thefirst of each month from January to June. The box–whisker symbols show the 10th, 30th, 50th, 70th, and90th forecast percentiles in each month, while the ob-servations (thick line) are plotted starting in first monthof the period they represent. The climatological fore-cast—that is, the historical distribution of the observa-tions—is a “naive” forecast that does not change withtime, and is shown for comparison. The bias-correctedforecasts have a similar spread to the raw forecasts,while the calibrated forecasts (“Calib_56” denotes thecalibration using the entire 56-yr retrospective fore-

cast), which reflect the total forecast uncertainty, havewider error bounds. Because of the skewed nature ofthe predictand, the increased spread effect of calibra-tion is more prominent in high flow years than in lowflow years.

The spread differences between the raw and cali-brated forecasts that occur later in the forecast season,that is, for forecasts made in April–June, are a conse-quence of the forecast signal being increasingly derivedfrom the initial conditions rather than from future cli-mate as the season progresses. This phenomenon isreadily apparent from a comparison of the forecast un-certainty against the frequency at which the observa-tions occur in each part of the forecast distribution.Figure 7, which contains reliability diagrams akin tothose used by Hamill (1997), shows this relationship.For forecasts made early in the season, the predictedforecast uncertainty from all methods is relatively reli-able. By April, however, the raw ESP forecast line be-gins to flatten relative to the 1:1 line as the raw forecastspread narrows. By June, the observation falls into theforecast’s lowest and highest deciles approximately30% and 40% of the time, respectively, rather than the10% and 10% of the time that would result from areliable forecast. This overconfidence is also reflectedin the bias-corrected May and June forecasts, althoughthe bias correction reduces the underprediction bias ofthe raw forecasts. In contrast, the calibrated forecastthat uses the entire retrospective forecast period(Calib_56) corrects the reliability deficiencies of thelate-season forecasts, giving the forecast and observedfrequencies a 1:1 relationship. A forecast adjusted witha state-dependent calibration (e.g., Calib_35, whichuses a sliding-window subset of size N � 35) also pre-serves the reliability in the May and June forecasts, butleads to reliability errors for earlier forecasts. Theseerrors arise when the forecast erroneously selects thesubset from which to use statistics for calibration. Forinstance, a February forecast that is high when theeventual observation is low would calibrate toward aninappropriately high climatological mean.

As one would expect, the spread of the forecasts nar-rows and the bias falls as the lead time of the forecastdecreases. Figure 8, in which all raw ESP forecast dis-tributions are plotted against the observations, showsthe lack of discrimination in the early-season forecasts(e.g., the ESP forecast is little different than climatol-ogy), and the improvement in skill as the season pro-gresses. Despite the overplotting of symbols in the fig-ure, it also illustrates the bias toward underpredictionand overly narrow predications of forecast spread forthe late season, high flow forecasts: the observationsfall with inordinate frequency into the top tercile of the

140 J O U R N A L O F H Y D R O M E T E O R O L O G Y VOLUME 9

Page 10: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

forecast distributions, in concordance with the reliabil-ity errors shown in Fig. 7.

In contrast to the raw forecasts, the calibrated fore-casts based on the statistics from the entire retrospec-tive forecast period (Fig. 9a) in each month exhibit aspread that contains the observations with the appro-priate frequencies. That the forecasts still show a largespread even on 1 June, when half of the forecast periodhas been observed, reflects added hydrologic uncer-tainty (simulation errors that hydrologic model calibra-tion is not able to remove) that is not present in the raw

ESP forecast distributions. The decreased sharpness ofthe calibrated forecasts may well be perceived as adrawback by forecast users—for example, a widerspread of the 1 April calibrated forecasts decreasestheir apparent utility for decision making. The state-dependent, sliding-window calibration approach (de-signed to maximize the retention of forecast sharpnessfor forecast system states associated with higher skill)was found to incur a smaller forecast spread increase.For example, the forecasts from Fig. 9b, while widerthan the raw ESP for 1 April, are narrower than those

FIG. 6. Time series for six years of forecasts, using symbols as in Fig. 1. All forecasts were made on the first day of the month inwhich they are plotted.

FEBRUARY 2008 W O O D A N D S C H A A K E 141

Page 11: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

from Fig. 9a. In comparison, the forecasts that are onlybias-corrected but not calibrated (Fig. 9c) have similaruncertainty but less median error compared to the rawESP forecasts.

Neither the bias-correction nor ensemble calibrationapproaches much alter the correlation of the medianforecast with observations (Table 2). The variation inensemble median between approaches is small com-pared with the interannual variations of the ensemblemedian relative to the observations, which largely de-termines the correlations.

The MAE of the ensemble median is generally im-proved by a small percentage in all approaches. Table 3shows that the raw ESP forecast is increasingly accurateas the year progresses, relative to the climatology fore-cast (which does not change from month to month).The bias-correction approach alone reduces the errorrelative to the raw forecasts by roughly 5%, 8%, and

13% in the April, May, and June forecasts, respectively.Earlier than this, however, bias correction leads toequal or greater errors to the raw forecasts, due to thelarge scatter of the datasets involved. Calibration withall the retrospective forecasts (Calib_56) similarly of-fers greater improvements in the late-season forecasts,but mixed results in the early-season forecasts. Thecombination of bias correction and calibration (de-noted “BC�Calib_56”, in which bias-corrected ESPensembles are the input to the ensemble calibrationalgorithm) gives similar results to the techniques whenused separately, largely because the ensemble calibra-tion is inherently a form of bias correction. Last, whenthe state-dependent ensemble calibration is used(Calib_35), the errors were lower than for raw ESP inall months, but the improvements gained in the late-season forecast (April–June) were slightly smaller thanfor other methods.

FIG. 7. Reliability of forecasts for raw, bias-corrected, and two calibrated ESPs that used all of the retrospective forecasts (Calib_56)and smaller state-dependent sampling for training (Calib_35). Note that although error bars on the reliability curves are not shown (forreadability), it is certain that there would be substantial overlap of typical confidence bounds associated with the different curves.

142 J O U R N A L O F H Y D R O M E T E O R O L O G Y VOLUME 9

Page 12: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

The performance of the forecasts’ probability distri-butions is illustrated by the CRPS in Table 4. TheCRPS is influenced by both the bias and the uncertaintyof the forecast distribution. Like the MAE, forecastCRPS is reduced by the calibration and bias-correctionapproaches in most months. Despite the larger uncer-tainty, hence lower sharpness and resolution of the cali-brated forecasts as compared to the raw forecasts (cf.Figs. 8 and 9), the reduced bias and increased forecastreliability both contribute to the improvement in skill.Bias correction alone affords nearly the same improve-ment in CRPS for forecasts made in February–April,but in May–June does not yield equal benefits becauseof its failure to correct spread deficiencies. The com-bination of bias correction and calibration applied insequence approximately equals the performance ofcalibration with the entire retrospective forecast(Calib_56). Notably, the Calib_35 results show the onlyincreases in CRPS relative to raw ESP, in January; oth-erwise this state-dependent approach improves uponraw ESP, but not quite to the extent shown by theCalib_56 approach and, in some months, the bias-correction approach. This result suggests that the ap-plication of a state-dependent calibration approachmay require particular care at times when relativelygreater potential for a misidentification of system state

exists. The approach of calibrating with the entire ret-rospective forecast (Calib_56) and the combination ap-proach (BC�Calib_56) are, for this case study, slightlysuperior to the bias-correction and Calib_35 methods.All approaches, however, compare well overall to theperformance of raw ESP.

The CRPS values can be expressed in the form ofskill scores relative to the climatological forecast CRPS.The generally lower CRPS values for the forecast cali-bration and bias-correction approaches, compared toraw ESP, translate into higher associated skill scores—as the CRPSS values in Table 5 illustrate. The skillscores confirm that the calibration approaches offersmall percentage improvements early in the forecastseason but increase in value as the forecast season pro-gresses. This is true for bias correction as well, althoughthe gains level out in the final two forecasts.

The implications of calibrating a forecast from a wa-ter supply perspective are illustrated in Fig. 10, showingan example of a forecast initialized 1 April 1996. This“spread confidence” diagram allows a forecast user toassociate an arbitrary confidence interval width (in per-cent, on the x axis) with upper and lower forecastbounds (on the y axis). The figure is constructed byplotting a comprehensive set of interval bounds calcu-lated from the probability density function of the fore-

FIG. 8. Scatterplot of raw ESP forecasts vs observations, using symbols as in earlier figures.

FEBRUARY 2008 W O O D A N D S C H A A K E 143

Page 13: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

FIG. 9. Scatterplots of (a) ESP forecasts calibrated on the entire retrospective forecast period vs observations, with symbols as inearlier figures, for January–June start dates; (b) forecasts calibrated on a sliding retrospective forecast sample of size N � 35; and (c)bias-corrected forecasts without calibration. The latter two sets are shown for April–June start dates only.

144 J O U R N A L O F H Y D R O M E T E O R O L O G Y VOLUME 9

Page 14: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

cast, and is useful for communicating forecast uncer-tainty. It makes clear, for instance, that confidence inthe forecast median (a single value) occurring is low,and shows how wide the forecast bounds must be toachieve a desired level of confidence (the 50% and 90%intervals are shown, but other intervals can be readfrom the figure). The raw ESP forecast on the left in-dicates a negligible likelihood of flows exceeding theobservation (dark dashed line) that occurred. When thetotal hydrologic uncertainty of the forecast method isaccounted for using the Calib_35 approach (right), thechance of recording flows at that level is still estimatedto be low, but would not be as readily discounted.

4. Discussion and conclusions

In response to a growing recognition that dynamical,physically based models cannot reduce simulation er-rors to the point that their raw outputs are generallysuitable for direct input to end use applications, statis-tical postprocessing techniques for bias correction andtransformation of raw model outputs are increasinglybeing implemented hand in hand with dynamical ap-proaches. This practice appears to be more advanced inatmospheric and climate science, but it is no less appli-

cable for the dynamical models used in land surfacehydrology, where the expectation of physically basedmodel predictions alone outperforming existing statis-tical (regression based) prediction has waned in theface of inconsistent model performance. Physicallybased hydrology model ensemble predictions made inthe ESP framework are inherently overconfident due inpart to the deterministic representation of initial con-ditions, as well as of model parameters and structure.The result of this overconfidence is poor forecast reli-ability and an overestimation of forecast ensemblesharpness, which is particularly damaging to model andforecast credibility in risk management enterprises, forexample, related to water, energy, and environmentalhazards.

The statistical postprocess calibration approach out-lined and demonstrated in this paper improved the re-liability (the representation of forecast uncertainty) ofprobabilistic streamflow forecasts, albeit at the cost ofslightly decreased forecast sharpness, relative to theperformance of both raw and bias-corrected ESP fore-casts. The net result of these two effects was nonethe-less that probabilistic forecast skill (as represented bythe CRPSS) was generally improved by calibration.The larger calibrated forecast spread, relative to rawESP, is a consequence of more accurately representing

TABLE 3. Mean absolute error (in MCM) of distribution medi-ans from forecasts of climatology (clim), raw ESP, and three ESPcalibration variations. The “BC � Calib56” heading indicates thatboth bias correction and ensemble calibration were applied.

Month ClimRawESP

Biascorrection Calib_56

BC �Calib_56 Calib_35

Jan 993 873 878 884 888 862Feb 993 718 720 689 686 685Mar 993 678 710 715 715 677Apr 993 508 484 498 491 503May 993 366 336 337 348 355Jun 993 332 289 298 306 297

TABLE 4. CRPS (in MCM) of forecasts from climatology (clim),raw ESP, bias-corrected ESP, and three ESP calibration varia-tions. The “BC � Calib56” heading indicates that both bias cor-rection and ensemble calibration were applied.

Month ClimRawESP

Biascorrection Calib_56

BC �Calib_56 Calib_35

Jan 680 638 618 602 605 650Feb 680 526 503 492 489 508Mar 680 517 496 482 481 497Apr 680 398 349 344 337 359May 680 321 270 243 246 253Jun 680 323 274 218 221 225

TABLE 5. CRPSS (relative to a climatological forecast) of fore-casts from raw ESP, bias-corrected ESP, and three ESP calibra-tion variations. The “BC � Calib_56” heading indicates that bothbias correction and ensemble calibration were applied.

MonthRawESP

Biascorrection Calib_56

BC �Calib_56 Calib_35

Jan 0.06 0.09 0.12 0.11 0.05Feb 0.23 0.26 0.28 0.28 0.25Mar 0.24 0.27 0.29 0.29 0.27Apr 0.41 0.49 0.49 0.50 0.47May 0.53 0.60 0.64 0.64 0.63Jun 0.53 0.60 0.68 0.68 0.67

TABLE 2. Correlation of observations and distribution mediansfrom forecasts of climatology, raw ESP, bias-corrected ESP, andthree ESP calibration variations. The “BC � Calib_56” headingindicates that both bias correction and ensemble calibration wereapplied.

MonthRawESP

Biascorrection Calib_56

BC �Calib_56 Calib_35

Jan 0.47 0.46 0.46 0.45 0.48Feb 0.68 0.68 0.68 0.68 0.69Mar 0.68 0.68 0.68 0.68 0.70Apr 0.85 0.85 0.85 0.86 0.85May 0.92 0.92 0.92 0.92 0.92Jun 0.94 0.94 0.94 0.94 0.94

FEBRUARY 2008 W O O D A N D S C H A A K E 145

Page 15: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

the uncertainty of the forecasts. A state-dependentvariation of the calibration approach mitigated the lossof sharpness for the late-season forecasts, but was lesseffective in correcting the reliability for early-seasonforecasts. Krzysztofowicz and Herr (2001), for compari-son, showed that calibration could be more effectivewhen system state (in their case, precipitation input)was considered. The mixed performance of the state-dependent method here, however, suggests that it maybe appropriate only when system states can be identi-fied accurately by the forecast system (to avoid misclas-sification errors). In comparison to the calibration ap-proaches, a quantile–quantile bias-correction approachled to similar gains in forecast accuracy and probabilis-tic skill, but did not also correct forecast reliabilityproblems late in the forecast season.

One potential drawback of the two calibration meth-ods described here (in contrast to the bias-correctionmethod) is that they discard the raw ensemble esti-mates of uncertainty and substitute instead a con-structed constant (or cyclic, e.g., monthly varying) un-certainty. The resultant forecasts therefore do not dis-tinguish between times when the raw model forecastspread is narrower, indicating greater or lesser forecastsystem certainty. In many forecast systems, this varia-tion is deemed an important property of a prediction(Toth et al. 2001). The behavior of time-varying modeluncertainty estimates is difficult to calibrate becausethe sample sizes available for verifying their accuracyfrom a retrospective forecast dataset are limited. A

sampling window approach similar to the one used herefor mean states may offer one solution, but the typicalheteroskedasticity of streamflow simulation errors (i.e.,that correlate with mean states) could make a calibra-tion that is jointly dependent on both mean state anduncertainty state problematic. Nonetheless, a calibra-tion approach using both the mean and spread infor-mation from the raw model forecasts would be desir-able, provided the greater complexity of the methodwould not compromise its accessibility for practical use.

Another potential drawback is that the forecast cali-bration approach does not preserve distinct ensembletraces (time series) that in some applications areneeded as input to a subsequent analysis (in particular,to a reservoir system model). The mean and variance ofthe calibrated forecasts can be used to estimate variouspercentiles (e.g., 10 and 90) that describe the forecastuncertainty, but an additional procedure, not describedhere, would be needed to regenerate a correspondingensemble of streamflow traces.

Despite the noted drawbacks, the forecast calibrationapproach here illustrated appears to be suitable for useas a statistical counterpart to hydrologic model-basedprediction. One key strength of the approach is that itensures that the model ensemble forecasts can performno worse than a naive (climatological) forecast, which isnot necessarily the case for uncalibrated ensemble pre-dictions. If a retrospective forecast dataset exists, thetechnique is also straightforward to implement, requir-ing only the use of basic statistics. With careful appli-

FIG. 10. Forecast “spread confidence” plots relating the forecast distribution to confidenceinterval width for an April–July flow forecast made in April 1996. The median (light dashedline) and 50% and 90% confidence intervals are highlighted for the (left) raw ESP forecastand (right) a conditioned Calib_35 forecast. The observation is shown by the dark dashed line.

146 J O U R N A L O F H Y D R O M E T E O R O L O G Y VOLUME 9

Page 16: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

cation, it reduces forecast reliability errors that mayotherwise limit the acceptability of ensemble predic-tions generated by hydrology models.

Acknowledgments. The lead author acknowledgesthe helpful feedback of three anonymous reviewersand the informal comments of David Garen, Tom Pa-gano (both of the U.S. Department of AgricultureNatural Resources Conservation Service), and RandalWortman (U.S. Army Corps of Engineers). The re-search reported herein was supported in part by theNational Aeronautics and Space Administration underCooperative Agreement NNSO6AA78G and GrantNNX06AE34G to the University of Washington, and inpart by the Joint Institute for the Study of the Atmo-sphere and Ocean at the University of Washington un-der NOAA Cooperative Agreement NA17RJ1232.

REFERENCES

Ajami, N. K., Q. Duan, X. Gao, and S. Sorooshian, 2006: Multi-model combination techniques for analysis of hydrologicalsimulations: Application to distributed model intercompari-son project results. J. Hydrometeor., 7, 755–768.

Andreadis, K. M., and D. P. Lettenmaier, 2006: Assimilating re-motely sensed snow observations into a macroscale hydrol-ogy model. Adv. Water Res., 29, 872–886.

Atger, F., 2003: Spatial and interannual variability of the reliabil-ity of ensemble-based probabilistic forecasts: Consequencesfor calibration. Mon. Wea. Rev., 131, 1509–1523.

Beven, K. J., 1993: Prophecy, reality and uncertainty in distribut-ed hydrological modelling. Adv. Water Res., 16, 41–51.

Boyle, D. P., H. V. Gupta, S. Sorooshian, V. Koren, Z. Zhang, andM. Smith, 2001: Toward improved streamflow forecasts:Value of semidistributed modeling. Water Resour. Res., 37,2749–2760.

Cayan, D. R., S. A. Kammerdiener, M. D. Dettinger, J. M. Cap-rio, and D. H. Peterson, 2001: Changes in the onset of springin the western United States. Bull. Amer. Meteor. Soc., 82,399–415.

Clark, M. P., and A. G. Slater, 2006: Probabilistic quantitativeprecipitation estimation in complex terrain. J. Hydrometeor.,7, 3–22.

Garen, D. C., 1992: Improved techniques in regression-basedstreamflow volume forecasting. J. Water Resour. Plann. Man-age., 118, 654–670.

Glahn, H. R., and D. A. Lowry, 1972: The use of model outputstatistics (MOS) in objective weather forecasting. J. Appl.Meteor., 11, 1203–1211.

Hamill, T. M., 1997: Reliability diagrams for multicategory proba-bilistic forecasts. Wea. Forecasting, 12, 736–741.

——, and S. J. Collucci, 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev., 125, 1312–1327.

——, J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting:Improving medium-range forecast skill using retrospectiveforecasts. Mon. Wea. Rev., 132, 1434–1447.

Hamlet, A. F., P. W. Mote, M. P. Clark, and D. P. Lettenmaier,2005: Effects of temperature and precipitation variability onsnowpack trends in the western United States. J. Climate, 18,4545–4561.

Hashino, T., A. A. Bradley, and S. S. Schwartz, 2007: Evaluation

of bias-correction methods for ensemble streamflow volumeforecasts. Hydrol. Earth Syst. Sci., 11, 939–950.

Hersbach, H., 2000: Decomposition of the continuous rankedprobability score for ensemble prediction systems. Wea. Fore-casting, 15, 559–570.

Hirsch, R. M., 1982: A comparison of four streamflow record ex-tension techniques. Water Resour. Res., 18, 1081–1088.

Koczot, K., and M. D. Dettinger, 2003: Climate effects of Pacificdecadal oscillation on streamflow of the Feather River, Cali-fornia. Proc. 71st Western Snow Conf., Scottsdale, AZ, West-ern Snow Conference, 139–142.

Krzysztofowicz, R., 1999: Bayesian theory of probabilistic fore-casting via deterministic hydrologic model. Water Resour.Res., 35, 2739–2750.

——, and H. D. Herr, 2001: Hydrologic uncertainty processor forprobabilistic river stage forecasting: Precipitation-dependentmodel. J. Hydrol., 249, 46–68.

Kumar, A., 2007: On the interpretation and utility of skill infor-mation for seasonal climate predictions. Mon. Wea. Rev., 135,1974–1984.

Leavesley, G. H., S. L. Markstrom, P. J. Restrepo, and R. J. Viger,2002: A modular approach to addressing model design, scale,and parameter estimation issues in distributed hydrologicalmodelling. Hydrol. Processes, 16, 173–187.

Liang, X., D. P. Lettenmaier, E. F. Wood, and S. J. Burges, 1994:A simple hydrologically based model of land surface waterand energy fluxes for general circulation models. J. Geophys.Res., 99, 14 415–14 428.

Liston, G. E., and K. Elder, 2006: A meteorological distributionsystem for high-resolution terrestrial modeling (MicroMet).J. Hydrometeor., 7, 217–234.

Lohmann, D., E. Raschke, B. Nijssen, and D. P. Lettenmaier,1998: Regional scale hydrology: I. Formulation of the VIC-2Lmodel coupled to a routing model. Hydrol. Sci. J., 43, 131–142.

Mote, P. W., A. F. Hamlet, M. P. Clark, and D. P. Lettenmaier,2005: Declining mountain snowpack in western NorthAmerica. Bull. Amer. Meteor. Soc., 86, 39–49.

Pagano, T., and D. Garen, 2005: A recent increase in western U.S.streamflow variability and persistence. J. Hydrometeor., 6,173–179.

Reichle, R. H., D. B. McLaughlin, and D. Entekhabi, 2002: Hy-drologic data assimilation with the ensemble Kalman filter.Mon. Wea. Rev., 130, 103–114.

Schaake, J. C., and Coauthors, 2007: Precipitation and tempera-ture ensemble forecasts from single-value forecasts. Hydrol.Earth Syst. Sci. Discuss., 4, 655–717.

Seo, D.-J., V. Koren, and N. Cajina, 2003: Real-time variationalassimilation of hydrologic and hydrometeorological data intooperational hydrologic forecasting. J. Hydrometeor., 4, 627–641.

——, H. Herr, and J. C. Schaake, 2006: A statistical post-processor for accounting of hydrologic uncertainty in short-range ensemble streamflow prediction. Hydrol. Earth Syst.Sci. Discuss., 3, 1987–2035.

Sorooshian, S., Q. Duan, and V. K. Gupta, 1993: Calibration ofrainfall-runoff models: Application of global optimization tothe Sacramento soil moisture accounting model. Water Re-sour. Res., 29, 1185–1194.

Toth, Z., Y. Zhu, and T. Marchok, 2001: The use of ensembles toidentify forecasts with small and large uncertainty. Wea.Forecasting, 16, 463–477.

——, O. Talagrand, and Y. Zhu, 2005: The attributes of forecastsystems: A framework for the evaluation and calibration of

FEBRUARY 2008 W O O D A N D S C H A A K E 147

Page 17: Correcting Errors in Streamflow Forecast Ensemble Mean and …hopson/PabloMendoza/Wood_Schaake... · 2011. 3. 24. · streamflow prediction at monthly to seasonal lead times. Rather

weather forecasts. Predictability of Weather and Climate,T. N. Palmer and R. Hagedorn, Eds., Cambridge UniversityPress, 584–595.

Twedt, T. M., J. C. Schaake, and E. L. Peck, 1977: NationalWeather Service extended streamflow prediction. Proc. 45thWestern Snow Conf., Albuquerque, NM, Western Snow Con-ference, 52–57.

Vrugt, J. A., and B. A. Robinson, 2007: Treatment of uncertaintyusing ensemble methods: Comparison of sequential data as-similation and Bayesian model averaging. Water Resour. Res.,43, W01411, doi:10.1029/2005WR004838.

——, C. G. H. Diks, H. V. Gupta, W. Bouten, and J. M. Ver-straten, 2005: Improved treatment of uncertainty in hydro-logic modeling: Combining the strengths of global optimiza-tion and data assimilation. Water Resour. Res., 41, W01017,doi:10.1029/2004WR003059.

——, M. P. Clark, C. G. H. Diks, Q. Duan, and B. A. Robinson,2006: Multi-objective calibration of forecast ensembles usingBayesian model averaging. Geophys. Res. Lett., 33, L19817,doi:10.1029/2006GL027126.

Wagener, T., and H. V. Gupta, 2005: Model identification for hy-drological forecasting under uncertainty. Stochastic Environ.Res. Risk Assess., 19, 378–387.

——, N. McIntyre, M. J. Lees, H. S. Wheater, and H. V. Gupta,2003: Towards reduced uncertainty in conceptual rainfall-runoff modelling: Dynamic identifiability analysis. Hydrol.Processes, 17, 455–476.

Weber, F., L. Perreault, and V. Fortin, 2006: Measuring the per-formance of hydrological forecasts for hydropower produc-tion at BC Hydro and Hydro-Quebec. Preprints, 18th Conf.on Climate Variability and Change, Atlanta, GA, Amer. Me-teor. Soc., P8.5.

Wigmosta, M. S., L. Vail, and D. P. Lettenmaier, 1994: A distrib-uted hydrology-vegetation model for complex terrain. WaterResour. Res., 30, 1665–1680.

Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences:An Introduction. International Geophysical Series, Vol. 59,Academic Press, 464 pp.

Wood, A. W., and D. P. Lettenmaier, 2006: A test bed for newseasonal hydrologic forecasting approaches in the westernUnited States. Bull. Amer. Meteor. Soc., 87, 1699–1712.

——, E. P. Maurer, A. Kumar, and D. P. Lettenmaier, 2002:Long-range experimental hydrologic forecasting for the east-ern United States. J. Geophys. Res., 107, 4429, doi:10.1029/2001JD000659.

148 J O U R N A L O F H Y D R O M E T E O R O L O G Y VOLUME 9