hydrological sciences journal predictive downscaling based...

19
PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [Columbia University] On: 14 February 2011 Access details: Access Details: [subscription number 932906123] Publisher Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37- 41 Mortimer Street, London W1T 3JH, UK Hydrological Sciences Journal Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t911751996 Predictive downscaling based on non-homogeneous hidden Markov models Abedalrazq F. Khalil a ; Hyun-Han Kwon b ; Upmanu Lall a ; Yasir H. Kaheil a a International Research Institute for Climate and Society, Earth and Environmental Engineering, Columbia University, New York, NY, USA b Department of Civil Engineering, Chonbuk National University, Jeonbuk, South Korea Online publication date: 23 April 2010 To cite this Article Khalil, Abedalrazq F. , Kwon, Hyun-Han , Lall, Upmanu and Kaheil, Yasir H.(2010) 'Predictive downscaling based on non-homogeneous hidden Markov models', Hydrological Sciences Journal, 55: 3, 333 — 350 To link to this Article: DOI: 10.1080/02626661003780342 URL: http://dx.doi.org/10.1080/02626661003780342 Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Upload: others

Post on 16-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by: [Columbia University]On: 14 February 2011Access details: Access Details: [subscription number 932906123]Publisher Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Hydrological Sciences JournalPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t911751996

Predictive downscaling based on non-homogeneous hidden MarkovmodelsAbedalrazq F. Khalila; Hyun-Han Kwonb; Upmanu Lalla; Yasir H. Kaheila

a International Research Institute for Climate and Society, Earth and Environmental Engineering,Columbia University, New York, NY, USA b Department of Civil Engineering, Chonbuk NationalUniversity, Jeonbuk, South Korea

Online publication date: 23 April 2010

To cite this Article Khalil, Abedalrazq F. , Kwon, Hyun-Han , Lall, Upmanu and Kaheil, Yasir H.(2010) 'Predictivedownscaling based on non-homogeneous hidden Markov models', Hydrological Sciences Journal, 55: 3, 333 — 350To link to this Article: DOI: 10.1080/02626661003780342URL: http://dx.doi.org/10.1080/02626661003780342

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

Page 2: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

Predictive downscaling based on non-homogeneous hidden Markovmodels

Abedalrazq F. Khalil1, Hyun-Han Kwon2, Upmanu Lall1 & Yasir H. Kaheil1

1International Research Institute for Climate and Society, Earth and Environmental Engineering, Columbia University, 918 S. W. MuddBuilding, Mail Code 4711, 5000 W. 120th St., New York, NY 10027, USA2Department of Civil Engineering, Chonbuk National University, 664-14 1Ga Deokjin-Dong Jeonju-City, Jeonbuk 561-756, South [email protected]

Received 13 June 2007; accepted 29 September 2008; open for discussion until 1 October 2010

Citation Khalil, A. F., Kwon, H.-H., Lall, U. & Kaheil, Y. H. (2010) Predictive downscaling based on non-homogeneous hidden Markovmodels. Hydrol. Sci. J. 55(3), 333–350.

AbstractWeather-state models have been shown to be effective in downscaling the synoptic atmospheric informa-tion to local daily precipitation patterns. We explore the ability of non-homogeneous hidden Markov models(NHMM) to downscale regional seasonal climate data to daily rainfall at a collection of gauging sites. The predictorsused are: ensemble means of seasonal rainfall as forecast by the DEMETER and ECHAMmodels, and the precedingseasonal outgoing long-wave radiation (OLR). As the downscaling of seasonal GCM-based predictions lacks theability to capture the intra-seasonal variability, we augment the seasonal GCM-driven inputs with statistically-driven predictions of the monthly rainfall amounts. The pooling effect of combining seasonal and monthly estimatesof the regional rainfall enhances the capacity of the NHMM to simulate the stochastic characteristics of rainfallfields. The monthly rainfall prediction is derived from a wide range of climate precursors such as the El Niño-Southern Oscillation, local sea-level pressure, and sea-surface temperature. Application of the methodology to datafrom the Everglades National Park region in South Florida, USA is presented for the seasonsMay–July and August–September using a 22-year sequence of seasonal data from eight rainfall stations. The model skill in capturing theseasonal and intra-seasonal rainfall attributes at each station is demonstrated graphically and using simple statisticalmeasures of efficiency. The hidden states derived from NHMM are qualitatively analysed and shown to correspondto the dominant synoptic-scale features of rainfall generating mechanisms, which reinforces the argument thatphysical processes are appropriately captured.

Key words downscaling; weather generator; precipitation; Markov models

Prévision en descente d’échelle basée sur des modèles de Markov cachés non-homogènesRésumé Les modèles de types de temps se sont révélés être efficaces pour la descente d’échelle d’informationsatmosphériques synoptiques vers des schémas de précipitation journalière locale. Nous explorons l’aptitude demodèles de Markov cachés non-homogènes (MMCN) pour transférer des données climatiques saisonnièresrégionales vers des précipitations journalières en un ensemble de sites d’observation. Les prédicteurs utiliséssont: des moyennes d’ensemble de la pluie saisonnière prévues par les modèles DEMETER et ECHAM, et leprécédant rayonnement émis de grande longueur d’onde saisonnier. Comme le transfert de prévisions saisonnièresbasées sur les modèles climatiques globaux (MCG) ne permet pas de capturer la variabilité intra-saisonnière, nousenrichissons les données d’entrée saisonnières issues des MCG avec des prévisions statistiques des pluies men-suelles. L’effet agrégé de la combinaison d’estimations saisonnières et mensuelles de la pluie régionale améliore lescapacités du MMCN à simuler les caractéristiques stochastiques des champs de pluie. La prévision de pluiemensuelle est dérivée d’un large ensemble de précurseurs climatiques comme l’Oscillation Australe-El Niño, lapression locale à la surface de la mer, et la température à la surface de la mer. Une application de la méthodologie àdes données de la région du Parc National des Everglades en Floride du Sud, Etats-Unis, est présentée pour lessaisons Mai–Juillet et Août–Septembre à partir d’une séquence de 22 ans de données saisonnières obtenues en huitstations pluviométriques. L’aptitude du modèle à capturer les caractéristiques pluviométriques saisonnières et intra-saisonnières en chaque station est démontrée graphiquement et à l’aide de mesures statistiques d’efficacité simples.Les états cachés dérivés du MMCN sont analysés qualitativement et apparaissent correspondre aux situationssynoptiques dominantes des mécanismes pluviogènes, ce qui renforce l’argument selon lequel les processusphysiques sont capturés de manière appropriée.

Mots clefs descente d’échelle; générateur météorologique; précipitations; modèles de Markov

Hydrological Sciences Journal – Journal des Sciences Hydrologiques, 55(3) 2010 333

ISSN 0262-6667 print/ISSN 2150-3435 online© 2010 IAHS Pressdoi: 10.1080/02626661003780342http://www.informaworld.com

Downloaded By: [Columbia University] At: 17:00 14 February 2011

mac2
Placed Image
Page 3: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

INTRODUCTION

Daily rainfall probabilities, the persistence of wet anddry regimes, and other rainfall statistics can vary sub-stantially over time and space in a systematic way.Climatic fluctuations are the major driver of suchpersistent year-to-year changes in rainfall probabil-ities, and are believed to affect rainfall statistics bothspatially and temporally. General circulation models(GCMs) perform reasonably well with respect toannual and seasonal weather states on a large spatialscale, and they stand as one of the primary sources forobtaining projections of future climate. Outputs fromGCMs underpin most climate impact assessments; yet,it is widely acknowledged that regional climate indicesas represented by GCMs contain significant sources ofuncertainty. Reliance on a single GCM could lead toinappropriate planning or simulation of the intra-sea-sonal rainfall. In addition, precipitation forecastsdemonstrate limited skill at sub-seasonal time scales,and they have limited capabilities in reproducing dailyprecipitation statistics at local and regional scales(Robock et al., 1993; Bates et al., 1998). The skilllevel obtained is found to be inadequate for aidingwater managers’ decision making processes. The gen-eration of stochastic realizations of daily rainfall at aset of stations in a manner that preserves spatiotem-poral variability is very important for assessing theimpact of potential climate change on agriculturalactivities, flood-risk management, rainfall–runoff pro-cesses, and the management of both surface water andgroundwater. For example, agro-economic models aregrossly unfit when purely built and calibrated usingthe observed daily sequence of rainfall at a givenstation. A stochastic realization of daily rainfall thatassimilates and merges both local and regional rainfallwill enable the estimation of a meaningful crop riskprofile.

In this paper, we seek to build a stochasticframework to generate daily statistics of precipitationat multiple sites simultaneously. These realizationsare conditioned on seasonal climate projections.Furthermore, in order to capture the intra-seasonal rain-fall variability, learning machine-based monthly predic-tions of the regional rainfall are utilized.

Data-based statistical approaches have achievedconsiderable success, and they usually seek to extractinformation from the existing data and provide robustinferences. Here, monthly prediction of the regionalrainfall serves to complement the seasonal GCM fore-casts in reproducing the intra-seasonal variation in the

generated rainfall sequences. The GCM-based analy-sis is constrained by the understanding of atmosphericdynamics at scales greater than two degrees of long-itude and latitude, and, thus, using stochastic modelsto reproduce the underlying physics could elicit betteranalyses at finer spatial and temporal scales.Specifically, the daily precipitation data have signifi-cant variability, distinctively non-Gaussian, do exhibitrelatively complicated spatio-temporal dependence(Mendes et al., 2006), and cannot be easily derivedfrom atmospheric dynamic equations. This inherentcomplexity has led to the development of statisticalweather-state downscaling models to relate daily pre-cipitation to synoptic atmospheric patterns (see e.g.Robertson et al., 2004; Mendes et al., 2006). Anexample of weather-state models is the non-homoge-neous hidden Markov model (NHMM), which isamong the stochastic downscaling methods whichcapture the most distinct pattern that influencesmulti-site precipitation occurrences and amounts.The NHMM has been used for downscaling of multi-site precipitation occurrences and amounts and veri-fied to provide plausible results (Charles et al., 1996;Bellone et al., 2000; Mehrotra et al., 2004; Robertsonet al., 2004). Also, the NHMM is claimed to have thecapability to provide some useful insights into thecirculation modes and the associated precipitationmechanisms (Hughes & Guttorp, 1994a,b, 1999;Mehrotra & Sharma, 2005).

Here, the NHMM is designed to be based onseasonal inputs as predicted from GCMs, and mon-thly inputs as predicted by data-driven learningmachines. So, the NHMM is designed to performsimulations conditional on preserving the seasonaland monthly attributes or precipitation as governedby the exogenous forcing factors. In this context, theNHMM is said to perform downscaling, which isdefined as the process of reconstructing the dailyvariability of the precipitation patterns (i.e. amounts,wet-to-wet probabilities, etc.) at spatially-distant sta-tions based on information available at coarser scalesin both time and space (i.e. synoptic patterns) (Kidson& Watterson, 1995).

The inclusion of GCM seasonal indices is foundto be insufficient to capture intra-seasonal variability.In order to guarantee the success of the rainfall simula-tion in the predictive context, there is a necessity toincorporate representations of sub-seasonal atmo-spheric processes that are responsible for precipitationgeneration. The shortcomings of stochastic downscal-ing algorithms for preserving variance motivated

334 Abedalrazq F. Khalil et al.

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 4: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

Wilks (1989) to condition the downscaling onmonthlyamounts to produce better agreement between thevariance of the simulated and observed daily rainfall.

Specifically, the purpose of this paper is to developand demonstrate the capability to provide reliable sub-seasonal predictions of precipitation. The goal is tobuild models that capture the temporal and spatial prop-erties of the precipitation data. These models aredesigned with the purpose to simulate realistic rainfallpatterns over monthly periods (i.e. 30-day sequences),as a basis for monthly and seasonal forecasts.

We applied the devised methodology onEverglades National Park (ENP) located in SouthFlorida, USA. Precipitation in the ENP is generatedby varying nonlinear processes that interact with eachother over a wide range of scales. A useable method todisaggregate seasonal and monthly predictions tosimulate daily values at many locations is importantfor the delicate ecosystem management at the ENP. Inthis context, the utility of the designed framework is ofsignificant importance to resource managers in that itenables interpretation and transfer of the results ofimproved climate predictions for optimal managementof water resources. In particular, simulation of thespatial distribution of precipitation amounts is ofgreat importance for water resources managementoperations, as well as the management of flood controlstorage and release.

STUDYAREA AND DATA

The Everglades National Park (ENP) presents one ofthe most widely recognized wetland ecosystems in theworld. The seasonal to intra-seasonal patterns ofrainfall exert a prevailing abiotic influence in theENP. The ecological restoration of the ENP hasbrought to the fore the challenge of how to delivercontrolled quantities of water at the right times to theright locations. Optimal operation strategies in theENP are rainfall-driven; thus rainfall prediction at theintra-seasonal scale is of paramount importance tostrategic management scenarios.

For this study, a 35-year record (1965–2000) ofdaily rainfall amounts at eight stations was used; theirlocations are shown in Fig. 1(a). The ENP experiencesa sub-Tropical climate with a distinct wet season in thesummer and dry season in the winter. Almost 75% ofthe annual precipitation falls during May–October(Fig. 1(b)), and the monthly precipitation amount var-ies between 0 and 508 mm (20.0 inches), realized atdifferent times of the year.

The ENP rainfall is characterized by a low-frequency behaviour with apparent non-stationarityin the long term (Kwon et al., 2006). This feature in therainfall time series suggests that the regional climate islikely to be marked by recurring, persistent multi-yeardroughts. Methods that can generate such scenariosare needed for effective planning. Therefore, a usefulframework to disaggregate seasonal simulations orpredictions to daily values at many locations is ofinterest, and could be used effectively to prioritizebest management practices.

Everglades National Park rainfall

Rainfall data at eight local raingauges located in andaround the southern Everglades region were used. TheEverglades extends from the south of LakeOkeechobee to Florida Bay. However, we selectedraingauges located only in the southern part of theEverglades area, because our study focuses on waterdelivery to the Everglades National Park. There areover 60 raingauges in the region. Eight stations out ofthe 60, for which long-term reliable data are readilyavailable, were selected. Each site has a different per-iod of record. There were a few missing values thatwere filled in using nearby station values. The periodof record of rainfall data extends from 1965 to 2000.However, this study used data only from 1979 to 2000,because some of the relevant climate time series dataare only available from the 1970s.

The processes responsible for rainfall over theENP vary over the rainy season. We developed twodistinct seasonal daily rainfall series: May-June-July(MJJ); and August-September-October (ASO), corre-sponding to time periods where distinctly differentrainfall mechanisms are thought to operate. Weselected the 92-day period beginning 1 May (MJJ)and the 92-day period beginning 1 August (ASO),corresponding to the first and second modes of thepeak rainy season over the Everglades, respectively,for the period 1979–2000, yielding 22 complete92-day years (2024 days).

Historical climate data and global circulationmodel outputs

The NHMMuses predictor variables that are relevant tothe state of the atmosphere in predicting the daily rain-fall sequence. Two seasonal types of predictor for gen-erating such forecasts are considered: (a) leadingclimate indicators, such as prior season sea-surface

Predictive downscaling based on non-homogeneous hidden Markov models 335

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 5: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

temperature (SST) and outgoing long-wave radiation(OLR), and (b) ensemble seasonal rainfall forecastsfrom multiple GCMs, to generate spatially-distributeddaily time-step rainfall fields. All climate data fieldsused in this study are summarized in Table 1. Both thepotential predictors and the predictors that were finallyselected are listed. Climate data were accessed from theInternational Research Institute for Climate and Society(IRI) Data Library (http://iridl.ldeo.columbia.edu/).

Data sets for SST and OLR were obtained from theanomaly grid product of Kaplan et al. (1998) andNOAA-CIRES CDC interpolated OLR, respectively.The data for SST and OLR are available from 1856and 1979, respectively. The three-month averages ofthese variables for February-March-April (FMA) wereused for the NHMM applied to May-June-July.

General circulationmodels have been used to repro-duce historical climate, understand climate mechanisms,

!

LakeOkeechobee

Key West

Miami

82°W 81°W 80°W

25°N

26°N

27°N

2 3 4 5 6 7 8 9 10 11 12

50

100

150

200

250

300

350

Rai

nfal

l (cm

/mon

th)

(b)

!

!

!

!

!

!

!

!

Everglades National Park

Water Conservation Area

Key West

West Palm Beach

Big Cypress Preserve

EVC

FLA

S13

G54

MIAMIFS

FMB

IFS

RPL

10

Month

(a)

(b)

Fig. 1 (a) Everglades map with the spatial distribution of the eight precipitation stations under study. (b) Regional monthlyprecipitation amount as aggregated over the eight stations, 75% of rain in MJJ and ASO seasons.

336 Abedalrazq F. Khalil et al.

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 6: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

and generate forecasts and projections of teleconnectionsand the inter-annual variability of sea-surface tempera-ture, among other things. The ECHAM 4.5 model(European Centre HAMburg, developed at the Max-Planck Institute for Meteorology – Roeckner et al.,1996) and DEMETER model (Development of aEuropean Multi-model Ensemble System for seasonalto inter-annual prediction – Palmer et al., 2004) rainfallforecasts, issued in May and August, were used in thisstudy for the MJJ and ASO seasons, respectively. Theselection of ECHAM predictions in the downscaling ofMJJ, and DEMETER predictions in the ASO downscal-ing, is based solely on the skill of the individual modelsin reproducing the regional seasonal rainfall over theENP. These models run from sets of initial conditions,each of them slightly different from the other, but con-sistent with the available observations to produce differ-ent ensembles. Both GCMswere averaged over multipleensembles.

The seasonal OLR for the FMA season exhibitsgood correlation with MJJ rainfall. This seasonal lagcorrelation is not as strong in the case of ASO rainfall.Correlation maps between the regional MJJ seasonalrainfall (i.e. as aggregated over the selected eight sta-tions) and ECHAM predicted precipitation versus pre-vious season OLR are shown in Fig. 2(a) and (b).Similarly, for ASO the correlation fields withDEMTER and OLR are shown in Fig. 2(c) and (d).The average over the zones of influence shown inFig. 2 and specified in Table 1 between GCM-basedprecipitation and antecedent OLR are utilized in thedownscaling.

Generation of monthly inputs

In order to reinforce the ability of NHMM to capturethe intra-seasonal variability of rainfall mechanismswe propose to use the regional estimate of monthly

rainfall. So in the case of season MJJ we utilize themonthly prediction per each month. There are manycandidate inputs that could directly or indirectly influ-ence monthly rainfall over ENP.

The candidate raw climatic precursors are theNINO3.4 index for the El Niño-Southern Oscillation,North Atlantic Oscillation (NAO) and Sea SurfaceTemperature Anomalies (SSTA) in different zones.For the monthly prediction in the MJJ season, theNorth Pacific Ocean (i.e. 5–20�N and 150–180�W)and for ASO Atlantic SST off the southeast coast ofthe USA (i.e. 20–35�N and 60–80�W) were stronglycorrelated to the regional rainfall time series at3 months lag time. The first two principal componentsfor the selected zone are estimated and found toexplain more than 85% of the SSTA variance at eachzone. These inputs are used to provide predictions foreach month within MJJ and for each month withinASO. The schematic shown in Fig. 3 summarizes theinputs used for both MJJ and ASO.

METHOD

The primary application we considered in this paper isthe simulation of the MJJ and ASO daily rainfall of theEverglades area at key points in time from 1979 to 2000using GCMs, climate variables, and statistically-drivenpredictions. Here we utilized the NHMM as a simula-tion engine for the ENP rainfall field. In order to providereliability in NHMM simulations at the intra-seasonalscale, we incorporated as exogenous input the monthlypredicted rainfall. The monthly prediction model isbased on the Bayesian inference technique referred toas Relevance Vector Machine (RVM). The applicabilityof RVM in the context of this paper is similar to a studyby Tripathi et al. (2006), in which Support VectorMachine (SVM) is used to perform statistical down-scaling of precipitation at a monthly time scale. In thenext sections the techniques used are briefly described.

Table 1 Different efficiency measures used to evaluate the model performance.

Season Experimental design (MJJ) Forecasting (MJJ) Forecasting (ASO)

Predictor ERA-40 OLR ECHAM4.5 OLR SST DEMETERLatitude 23�N–28�N 20�N–25�N 17�N–24�N 12�N–18�N 10�N–20�N 263�E–275�ELongitude 276�E–282�E 290�E–300�E 262�E–270�E 240�E–250�E 290�E–300�E 17�N–28�NCorr. coeff. 0.82 0.54 0.60 0.64 0.49 0.69Source ECMWF NOAA IRI NOAA IRI ECMWF

MJJ: May-June-July; ASO: August-September-October.IRI: International Research Institute for Climate and Society; ECMWF: European Centre for Medium-Range Weather Forecasts;NOAA: National Oceanic & Atmospheric Administration.

Predictive downscaling based on non-homogeneous hidden Markov models 337

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 7: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

Bayesian inference and the relevance vectormachine (RVM) to derive predictors

The RVM method was introduced by Tipping (2000).It adopts a Bayesian approach to find the most prob-able estimate of the parameters, given the availabledata. LetX¼ (x1, . . ., xn) where xi2 Rd denotes the listof all input sequences – exogenous in the global modelparlance and y¼ (y1, . . ., yn) the corresponding output.

In the application of this study, the multivariateinput set X may contain preceding NINO3.4, NAO,SSTA, and the average monthly rainfall in theEverglades for the available history of records, whilethe output vector y represents the monthly rainfallpredictions.

The RVM, as an example of kernel regression,assumes the existence of a kernel function K(�, x) (e.g.Gaussian radial basis function kernel) such that each

MJJ ECHAM Precipitation Forecast

FMA observed OLR

May forecasted

precipitation

Jun. forecasted

precipitation

Jul. forecasted

precipitation

MJJ Simulated Rainfall

NHMM

ASO DEMETER Precipitation Forecast

MJJ observed OLR

Aug. forecasted

precipitation

Sep. forecasted

precipitation

Oct. forecasted

precipitation

ASO Simulated Rainfall

NHMM

Fig. 3 Architecture of input components used in the NHMM.

(a) (b)

(c) (d)

Fig. 2 The correlation map between Everglades National Park MJJ–ASO precipitation and atmospheric fields: (a) MJJECHAM precipitation forecast with ENP MJJ precipitation; (b) FMA OLR with ENP MJJ precipitation; (c) MJJ OLR withENPASO precipitation; and (d) ASO DEMETER precipitation forecast with ENPASO precipitation.

338 Abedalrazq F. Khalil et al.

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 8: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

response random variable yi can be expressed as aweighted sum of the form:

yi ¼ wo þXnj¼1

wiKðxi; xjÞ þ ei (1)

For notational convenience, equation (1) is oftenrewritten as y ¼ Fw + e, where the vector of weightsw ¼ ðwo; ewÞTR with ew ¼ (w1, w2, . . ., wn,), the asso-ciated noise e¼ (e1, e2, . . ., en)

TR and F correspond tofixed nonlinear basis functions (Tipping, 2001) overthe inputs x ¼ (x1, x2, . . . xn)

TR. Here, the superscriptTR indicates the transpose. The nonlinear basis func-tions are constructed using kernel functions as follows:

F ¼1 Kðx1; x1Þ � � � Kðx1;xnÞ... ..

. . .. ..

.

1 Kðxn; x1Þ � � � Kðxn;xnÞ

264375 (2)

where, in this paper, the kernel function is the radialbasis function (RBF) with scale parameter sk, that is

Kðxi; xjÞ ¼ exp jjxi � xjjj2.2s2k

� �. Figure 4 shows

the complete linkage and structure between inputs (X),outputs (y), parameters (w) and hyper-parameters (a,s)where ai is the standard deviation associated with wi.

To find the parameter vector wwe shall maximizeP(w|X, y) which can be rewritten using the Bayesformula as P(w|X, y) / P(w|X, y) P(w). The priordistribution P(w) is specified to have normal distribu-tion for each weight wi with mean 0 and variance a�1

i .That is:

PðwjaÞ ¼Yni¼1

Nðwij0; a�1i Þ (3)

To complete the hierarchical specification ofthe model, we specify priors for the hyper-para-meter ai so that we can estimate P(w) ¼ Ð

P(w|a)P(a)da, and this is referred to as the hyper-prior.The choice of the hyper-prior is usually made wideand uninformative, and a broad variance is often used(Tipping, 2000, 2001). By combination, one couldestimate the weights by maximizing P(w|X, y). Theinterested reader is referred to Tipping (2001) andreferences therein for in-depth discussion about theestimation processes. The important implicit propertyof RVM is that, during the expectation maximizationprocess, many of the parameters am peak to infinityand the corresponding weight, wm, concentrates atzero. Therefore, one could consider the correspond-ing inputs irrelevant (Tipping, 2001). Thus, theworking set of the relevant basis functions, F,decreases until a sparse solution is found. The out-come of this optimization is that many elements ofa go to infinity, such that w will have only a fewnon-zero weights that will be considered as relevantvectors (see Khalil et al., 2005a,b; 2006, for detailson RVM).

Multivariate non-homogeneous hidden Markovmodels

Non-homogeneous Markov models relate broad-scaleatmospheric circulation patterns to local rainfall bypostulating weather states to act as a link between thetwo disparate scales (Hughes & Guttorp, 1999). Forinstance, let R ¼ {Rt

1, . . ., Rtn} be a multivariate

random vector giving precipitation amounts at a net-work of n sites. Let St be the weather state at time t andXt ¼ (X1, . . ., Xt) is a sequence of exogenous inputvectors (atmospheric measures) at time t such asGCMs and OLR, one for each data vector. The optimalnumber of states (M) is not known a priori. For theapplication of this study, we conclude that, given the

ow

αo α1

1w

11

1

),( 12 xxK

),( 1xxK n

),( 1 nxxK

),( 2 nxxK

),( nn xxK

w2

),( 21 xxK

),( 22 xxK

),( 2xxK n

),( 11 xxK

nw

ny2y1y

σ

α2 αn

Fig. 4 The RVM structure and the associated parametersand hyper-parameters. A high proportion of hyper-parameters are driven to large values in the posteriordistribution, and corresponding weights are driven to zero,giving a sparse model. Assuming that the precision a2 overthe w2 peaks to infinity, the corresponding parameter andvector that are encircled by the dotted line will be pruned.

Predictive downscaling based on non-homogeneous hidden Markov models 339

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 9: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

Bayesian Information Criterion (BIC) (Robertsonet al., 2003), it ranges from 4 to 5.

Following Hughes & Guttorp (1999), the twomain assumptions on the NHMM are:

(a) PðRtjST1 ;Rt�11 ;XT

1 Þ ¼ PðRtjStÞ where XT1 indi-

cates the sequence of atmospheric data fromtime 1 to T (i.e. the length of sequence) andsimilarly for S1

T and R1t-1; and

(b) PðStjSt�11 ;XT

1 Þ ¼ PðStjSt�1;X tÞ.

The inputs Xt allow this Markov process to varyover time; hence the name non-homogeneous. Theassumptions of conditional independence are easilyvisualized as edges in a directed graph of theNHMM, as shown in Fig. 5. These hidden state transi-tions are modelled by multinomial logistic regressiondepending on Xt:

PðSt ¼ ijSt�1 ¼ j;X t ¼ xÞ

¼ expðsji þ rtixÞPMm¼1

expðsjm þ rtmxÞ(4a)

and

PðS1¼ijX1¼ xÞ¼ expðsþ rt

ixÞ expðsi þ rtixÞ,XMm¼1

expðsm þ rtmxÞ(4b)

where sjm or sm is a real-valued parameter vector forstates j and m, and rti is a D-dimensional real-valuedparameter vector for state i and time t. The log-like-lihood of the data can be written as:

LðqÞ ¼ log PðRT1 jXT

1 ;q� �

¼ logXsT1

PðS1jX1Þ �YTt¼2

PðStjSt�1;X tÞ" #

�YTt¼1

PðRtjStÞ

(5)

The maximum likelihood estimate of the set of para-meters � for the NHMM-based application can becalculated utilizing the expectation maximization(EM) algorithm (Baum et al., 1970). Full details ofthe specific EM procedure used in this NHMMparameter estimation can be found in Robertsonet al. (2003).

MODEL APPLICATION AND DISCUSSION OFRESULTS

This section explores the application of the proposedframework. The aim here is to devise a technique toprovide realistic daily rainfall patterns overmonthly sequences. Monthly precipitation forecastingseeks to derive regional-scale information at a sub-seasonal scale using large-scale atmospheric variables.The RVM is used to model the connection betweenthese teleconnections and the monthly rainfall.Consequently, the NHMM is used to simulate dailyprecipitation based on monthly prediction from theRVM model, seasonal predictions from the GCMs,and preceding OLR observations as a prior seasonprecursor.

Monthly prediction model

The coupling of RVM predictions can provide a basisfor increasing simulation robustness, and the poolingeffect of merging seasonal and monthly precursors willenable us to capture the sub-seasonal attributes of therainfall. There are many physically-plausible variablesavailable to a data-driven model. Specifically, in cli-mate-based prediction models there is a high potentialfor buildingmodels that utilize a large number of inputs,only some of which are relevant. In such cases, poorlybuilt regression models tend to assign some coefficientsfor such inputs thatmay show some random correlation.The danger of using too many predictors will aggravatethe susceptibility to overfitting, particularly in the caseof high dimensions with few data (Carr, 1988; Neal,1994; Mackay, 2003).

R1

S1

X1

R2

S2

X2

Rt

St

Xt

RT

ST

XT

Fig. 5 The MVNHMM model structure.

340 Abedalrazq F. Khalil et al.

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 10: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

Prediction performance

Rainfall generatingmechanisms vary across the season,thus separate RVM models have been developed foreach month in the seasons of MJJ and ASO. In otherwords, we desire a regression model to be built for eachindividual month, where each is intended to conditionthe rainfall model on the atmospheric information rele-vant to thatmonth.We used the same set of predictors tobuild separate models for the three months in MJJ.Similarly, the months of ASO have the same predictors.The predictors NAO and El-Niño3.4 index for themonths of February-March-April (FMA) were used inpredicting the months of the MJJ season, and the aver-age over MJJ was used to predict the months of ASO.The first principal component was evaluated for thezone of influence and used in the prediction at onemonth lag. Both MJJ and ASO seasons have a recordof 35 years of data. The first 17 years were selected fortraining the model and the rest of the data for testing.

We have too few data samples to build a robust regres-sion model. However, the resulting RVM model issparse in its parameters and only uses five relevantvectors. A radial basis function was used to introducenonlinearity. The selection of the basic function para-meter was derived based on trial and error. The seasonalperformance of the built model, for both training andtesting phases, is shown in Fig. 6.

Table 2 presents average statistics of efficiencyfor MJJ and ASO models for both training and testing.The resulting predictions were used as exogenousinputs in the NHMM.

SELECTION OF THE NHMM MODEL ANDHIDDEN STATES

The number of hidden states in the NHMM has aconsiderable influence on the performance of themodel. A typical approach to the identification of theappropriate number of states is to minimize theBayesian Information Criterion (BIC). The BIC isused here to select both the number of hidden statesand the associated climate predictors. Correspondingto the predictors identified earlier, five and four hiddenstates were selected for the MJJ and ASO seasons,respectively, in the NHMM. For the observedsequence of daily precipitation, we wish to find themost likely sequence and the underlying hidden statesthat might have generated it. We can find the mostprobable sequence of hidden states by using theViterbi algorithm. This is a dynamic programmingalgorithm that provides a tractable way of analysingobservations of NHMM to recapture the most likely

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

40.0

45.0

50.0

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

40.0

45.0

50.0

0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0

Observed precipitation (cm/month)Observed precipitation (cm/month)

Pre

dict

ed p

reci

pita

tion

(cm

/mon

th)

Pre

dict

ed p

reci

pita

tion

(cm

/mon

th)

Training

Testing

Training

Testing

MJJ ASO

Fig. 6 RVM prediction results MJJ and ASO season results.

Table 2 Prediction model performance for MJJ and ASO.

MJJ ASOMeasures

Training Testing Training Testing

Average AIC -1.54 -11.61 10.63 6.18Average BIC 4.74 -3.56 27.38 27.63# Relevance vectors 5.0 5.00 8.00 8.00Normalized bias (nBias) 0.00 -0.04 0.00 0.03Normalized RMSE(nRMSE)

0.52 0.80 0.56 0.78

Index of agreement (IoA) 0.91 0.80 0.90 0.80Coefficient of efficiency 0.72 0.36 0.70 0.39Correlation coefficient (r) 0.85 0.65 0.84 0.65

Predictive downscaling based on non-homogeneous hidden Markov models 341

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 11: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

underlying state sequence (Viterbi, 1967; Rabiner,1989). The spatial, temporal and synoptic patternsassociated with each state in both MJJ and ASO arepresented in the following sections.

Simulation performance

The NHMM–RVM model described earlier wasapplied to the daily data separately for MJJ and ASOperiods. Here we present the results of 100 simulationsof the NHMM for each season. In the EM phase, thealgorithm was restarted 20 times from different ran-dom starting positions in the parameter space. Afterconvergence, and based on each restart, we selectedthe parameter set that produces maximum likelihoodover the 20 restarts to safeguard EM from convergingto poor local maxima.

The performance measures that were selected areas follows: (1) monthly amounts; (2) wet-to-wet prob-ability; (3) dry-to-dry probability; (4) number of wetdays; (5) number of dry days; (6) 7-day dry spell; and(7) maximum wet-day precipitation. These features ofthe rainfall process could be evaluated per month orper season. The NHMM challenge is to be able tocapture the behaviour of these statistics across thecontinuum of the intra-seasonal scale (i.e. 15, 30, 45and 90 days).

These statistics were evaluated for the 100 simu-lations at each station and then aggregated to reflectthe overall behaviour of precipitation over the ENP.The rainfall features that generated the rainfallsequences were then compared with those of the his-torical record. Figures 7 and 8 show the NHMMperformance for MJJ and ASO, as evaluated by theaforementioned features aggregated for all the stationsover 30-day sequences. The range of variation from100 simulations in each feature (i.e. rainfall statistic) isshown by the shaded area, which represents the inter-quartile range.

Model predictive ability was assessed quantita-tively to judge the degree to which the model simula-tion matches the actual observations. Frequently, onecould utilize different statistics of efficiency to mea-sure the goodness of fit. Table 3 shows the correlationcoefficient (r), index of agreement (IoA), the normal-ized root mean square error (nRMSE), and bias(nBias) for different selected statistics. Readers arereferred to Legates & McCabe (1999) for furtherdetails on efficiency measures.

In the case of MJJ simulation, even though wereproduced the amounts fairly well, the other statistics

show decreased performance. The statistics of wetdays, wet-to-wet probability, and dry-to-dry probabil-ity have correlation of 0.5 with the observed monthlysequences. Statistics of dry spell length of 7 days arereproduced with decreased variance for both MJJ andASO for monthly and seasonal sequences, as judgedfrom the plots and nRMSE. Consequently, maximumwet-day precipitation is not reproduced very well.

It is shown that the NHMM is capable of generat-ing station-averaged observed rainfall amounts. Theperformance of reproducing the amounts as 30-daysequences for MJJ and ASO is plotted for each stationindividually in Figs 9 and 10. In the case of ASO,Miami FS, PRL and IFS show correlation with theobserved data that equals or exceeds 0.6, while G54and S13 show correlation close to 0.2. In the case ofMJJ, most of the stations show correlation with theobserved values in the range 0.4–0.5.

Viterbi analysis

This section deals with the determination of the mostprobable sequence of underlying hidden weather statesthat has generated the observed sequence of dailyprecipitation. Linking these states to atmospheric pat-terns provides more intuitive reasoning as to what theysignify. The most likely weather-state sequences forMJJ and ASO, determined using the Viterbi algorithm,are shown in Fig. 11. For a certain day of the month,one can see the inter-annual sequence of states (hor-izontally) that has most probably generated theobserved precipitation. In general, these graphs sug-gest that the rainfall sequences exhibit considerablevariability on intra-seasonal, as well as inter-annualtime scales.

According to the BIC test, the optimal number ofstates forMJJ precipitation is five. Figure 12 presents theamount of precipitation associated with each state andthe frequency at which it has occurred; precipitationamount is expressed as a percentage of the total precipi-tation occurring during the time of analysis. Thus, State 1corresponds to a very high probability of precipitation atall the stations; State 2, in the case of MJJ, seems to beconnected to local convection – it is responsible foralmost 25% of the precipitation amount, even though itoccurs less than 5% of the time; and State 5 correspondsto a very low probability of rain at all stations, occurring35% of the time. States 3 and 5 together constitute themost common MJJ pattern, occurring 65% of the time.

Four states characterize the rainfall mechanismsin ASO, suggesting simpler weather dynamics

342 Abedalrazq F. Khalil et al.

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 12: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

50% Unc. Observed Predicted

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 20000

0.2

0.4

0.6

0.8

Time (month)1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

Time (month)

Wet

to w

et (

P)

Dry

to d

ry (

P)

0

5

10

15

20

25

Wet

day

s

0

1

2

3

4

7-da

y dr

y-sp

ell

Max

spe

llPr

ecip

. am

ount

(mm

/mon

th)

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000Time (month)

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000Time (month)

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000Time (month)

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000Time (month)

0

0.2

0.4

0.6

0.8

1

0

5

10

15

20

25

125

250

375

500

625

750

0

95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.50% Unc. Observed Predicted95% Unc.

Fig. 7 Monthly statistics aggregated for all stations in the ENP for the MJJ season.

0

0.2

0.4

0.6

0.8

Wet

to w

et (

P)

0

5

10

15

20

25

Wet

day

s

Dry

day

s

0

0.5

1

1.5

2

2.5

3

7-da

y dr

y-sp

ell

Prec

ip. a

mou

nt(m

m/m

onth

)M

ax s

pell

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000Time (month)

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000Time (month)

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000Time (month)

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000Time (month)

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000Time (month)

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000Time (month)

5

10

15

20

25

30

0

5

10

15

20

0

125

250

375

500

62550% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

Fig. 8 Monthly statistics aggregated for all stations in the ENP for the ASO season.

Predictive downscaling based on non-homogeneous hidden Markov models 343

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 13: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

compared to MJJ. State 1 seems to account for almost50% of the rainfall occurrence, while it is only wit-nessed 15% of the time.

There are different spatial precipitation patternsassociated with each state. Figure 13 depicts the per-centage of precipitation amount corresponding to eachstation. There is a different local excitation per state,except in the case of MJJ – State 2, where it seems thatthe level of excitation triggered by local convection isuniformly distributed.

In both MJJ and ASO, states 1 and 5 seem to havethe same spatial distribution. Whenever there is lowpressure at stations IFS and MIAMFS (Fig. 1), theprobability of precipitation is high at these stations,as well as all over the ENP, and vice versa. These plotssuggest that the precipitation spatial structure is highlydependent on the prevalent state. The weather-stateclassification scheme produced by the NHMM andthe Viterbi algorithm is successful in identifying thelinkages between large-scale variables and precipita-tion distribution patterns on the ground.

The mean seasonality of weather-state occurrenceis shown in Fig. 14. In MJJ, State 5, which representsthe dry state, is witnessed with increased frequency atthe beginning of the season. The associated OLRpattern indicates the absence of cloudiness over theFlorida-Caribbean region, and an associated windanomaly pattern that suggests atmospheric transportout of the region, consistent with the lack of rainfall.This state is most prominent in early May, with inci-dence decreasing in June and July as the rainy seasonpicks up. The spatial pattern of this dry state occur-rence is regional.

State 2 is attributed to local convection that peaksaround the end of May and the beginning of June. For

ASO, the temporal variation of the states’ frequency isnot that distinctive, except that by the end of the seasonthe dry state (i.e. S4) is prevalent.

Atmospheric patterns associated with weatherstates

Our ability to predict regional-scale seasonal climateanomalies depends strongly on our understanding ofthe dynamics of large-scale circulation anomalies. TheNHMM is characterized by its ability to explain anddemonstrate the linkage between the major circulationmodes and the precipitation patterns. The state vari-ables of the NHMM define the weather on each parti-cular day. Each day can be classified into one of theweather states. Also, an average of atmospheric circu-lation variables over common-state days provides ameans of assessing the physical linkage of the weatherstate with atmospheric forcing factors. The predomi-nant pattern associated with each state is evaluated byaveraging the wind vectors and OLR fields over all thedays classified into a particular state. The OLR andwind fields are extracted at the geo-potential height of850 hPa. Principal moisture paths can be drawnroughly from the atmospheric pattern correspondingto each state for both MJJ and ASO. The principalmoisture path was found to be largely consistent withthe observed precipitation pattern.

Figure 15 presents the contour plot of the OLRand the spatial variability of the wind componentsassociated with each weather state. There are uniquebaroclinic instability patterns associated with eachstate, and this could be further investigated to deriveinsights into moisture transfer mechanisms affecting

Table 3 Simulation model performance for both MJJ and ASO.

Type Statisticsof efficiency

MJJ ASO Type Statisticsof efficiency

MJJ ASO

Seasonal Monthly Seasonal Monthly Seasonal Monthly Seasonal Monthly

Amounts r 0.77 0.72 0.76 0.71 Wet–wet P r 0.69 0.55 0.45 0.55IoA 0.74 0.79 0.82 0.77 IoA 0.66 0.64 0.55 0.64nRMSE 1.5 1.19 0.98 1.26 nRMSE 1.53 1.66 1.44 1.48nBias 0.03 0.03 -0.01 -0.01 nBias -0.10 -0.09 -0.14 -0.14

Dry–dry P

r 0.55 0.54 0.47 0.63 Wet days r 0.61 0.55 0.52 0.60IoA 0.69 0.60 0.60 0.69 IoA 0.70 0.65 0.71 0.70nRMSE 1.35 1.88 1.59 1.51 nRMSE 1.44 1.69 1.21 1.38nBias -0.05 -0.07 -0.06 -0.05 nBias 0.01 0.01 -0.01 -0.01

7-day dryspell

r 0.43 0.51 0.65 0.60 Max. spell r 0.54 -0.05 0.65 0.62IoA 0.42 0.51 0.61 0.54 IoA 0.13 0.09 0.02 0.48nRMSE 2.58 2.52 1.84 2.39 nRMSE 12.85 11.1 6.64 2.80nBias -0.21 -0.24 -0.25 -0.29 nBias 0.05 -0.03 -0.44 -0.17

344 Abedalrazq F. Khalil et al.

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 14: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

Time (month) Time (month)

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

Prec

ip. a

mou

nt(m

m/m

onth

) IF

SPr

ecip

. am

ount

(mm

/mon

th)

PRL

0

200

400

600

800

0

200

400

600

800

0

250

500

750

1000

0

100

200

300

400

500

Prec

ip. a

mou

nt(m

m/m

onth

) FL

APr

ecip

. am

ount

(mm

/mon

th)

G54

Prec

ip. a

mou

nt(m

m/m

onth

) M

.IFS

Prec

ip. a

mou

nt(m

m/m

onth

) S1

3

0

200

400

600

800

0

130

260

390

520

650

0

130

260

390

520

650

0

250

500

750

1000

Prec

ip. a

mou

nt(m

m/m

onth

) E

VC

Prec

ip. a

mou

nt(m

m/m

onth

) FM

B

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc. 50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

Fig. 10 Aggregated precipitation amount per station in the ENP for the ASO season.

Time (month) Time (month)

0

Prec

ip. a

mou

nt(m

m/m

onth

) E

VC

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

Prec

ip. a

mou

nt(m

m/m

onth

) IF

S

Prec

ip. a

mou

nt(m

m/m

onth

) PR

LPr

ecip

. am

ount

(mm

/mon

th)

FMB

Prec

ip. a

mou

nt(m

m/m

onth

) FL

APr

ecip

. am

ount

(mm

/mon

th)

G54

Prec

ip. a

mou

nt(m

m/m

onth

) M

.IFS

Prec

ip. a

mou

nt(i

n/m

onth

) S1

3

200

400

600

800

0

250

500

750

1000

0

250

500

750

1000

0

250

500

750

1000

0

200

400

600

800

0

130

260

390

520

650

0

130

260

390

520

650

0

200

400

600

800

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc. 50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

50% Unc. Observed Predicted95% Unc.

Fig. 9 Aggregated precipitation amount per station in the ENP for the MJJ season.

Predictive downscaling based on non-homogeneous hidden Markov models 345

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 15: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

the ENP. The second state in MJJ seems to be asso-ciated with local convection that delivers large burstsof precipitation over the ENP, but with less frequencyand low coherence (see Fig. 12 for the precipitation

amount associated with State 2 and the number of daysthat experience this state).

Many interesting aspects of the mechanisms thatderive rainfall could be deduced from these composite

0

0.1

0.2

0.3

0.4

0.5

Prec

eipi

tatio

n am

ount

%

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Freq

uenc

y %

S1 S2 S3 S4 S5

S1 S2 S3 S4 S1 S2 S3 S4

S1 S2 S3 S4 S50

0.1

0.2

0.3

0.4

0

0.1

0.2

0.3

0.4

Prec

eipi

tatio

n am

ount

%

Freq

uenc

y %

Fig. 12 Weather states for MJJ (top) and ASO (bottom).

Day

of

the

mon

thD

ay o

f th

e m

onth

S1S2

S3S4

S2S3

S4S5

Time (month)1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

Time (month)1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

5

10

15

20

25

30

5

10

15

20

25

30

S1 Dry

Mild

dry

Mild

wet

Wet

II

Wet

I

(MJJ)

(ASO)

Dry

Mild

dry

Mild

wet

Wet

Fig. 11 Viterbi paths of the most probable sequence of weather states for MJJ and ASO.

346 Abedalrazq F. Khalil et al.

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 16: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

plots. Also, the weather states derived from theNHMM reveal the distribution of intrinsic local con-vections versus the large-scale stimulating processesthat affect precipitation over the ENP.

SUMMARYAND CONCLUSIONS

The objectives of this study were to develop and applya multivariate rainfall simulation and forecasting toolthat can generate daily rainfall sequences for each

0 10 20 30 40 50 60 70 800

0.2

0.4

0.6

0.8

1

Stat

e fr

eque

ncy

(10-

day

runn

ing

mea

n)

Day of the season0 10 20 30 40 50 60 70 80

Day of the season

S1 S2 S3 S4 S5

0

0.2

0.4

0.6

0.8

Stat

e fr

eque

ncy

(10-

day

runn

ing

mea

n)

S1 S2 S3 S4MJJ ASO

Fig. 14Mean seasonal variation of daily weather-state occurrence for MJJ and ASO, smoothed with a 10-day running mean.

1

32 4

1 2 3

4 511

5%

10%

1

33

5%

10%

322

5%

10%

2 44

5%

10%

4

5%

10%

1

26.0°N

25.0°N

24.5°N82.0°W 81.5°W 81.0°W 80.5°W 80.0°W 82.0°W 81.5°W 81.0°W 80.5°W 80.0°W

25.5°N

26.0° N

25.0° N

24.5° N82.0° W 81.5° W 81.0° W 80.5° W 80.0° W

25.5° N

26.0° N

25.0° N

24.5° N82.0° W 81.5° W 81.0° W 80.5° W 80.0° W

25.5° N

26.0° N

25.0° N

24.5° N82.0° W 81.5° W 81.0° W 80.5° W 80.0° W

25.5° N

26.0° N

25.0° N

24.5° N82.0° W 81.5° W 81.0° W 80.5° W 80.0° W

25.5° N

26.0° N

25.0° N

24.5° N82.0° W 81.5° W 81.0° W 80.5° W 80.0° W

25.5° N

26.0° N

25.0° N

24.5° N82.0° W 81.5° W 81.0° W 80.5° W 80.0° W

25.5° N

26.0°N

25.0°N

24.5°N

25.5°N

82.0°W 81.5°W 81.0°W 80.5°W 80.0°W

26.0°N

25.0°N

24.5°N

25.5°N

1 2

5%

10%

2 3

5%

10%

3

4

5%

10%

4 5

5%

10%

5

MJJ MJJMJJ

MJJ MJJ ASO

ASO ASO ASO

Fig. 13 Spatial distribution of water amounts per state at each station, per state. The radius of each circle represents thepercentage of water attributed to each station. The weather-state number is shown on the upper-left corner of each map and theseason identifier on the lower-right corner.

Predictive downscaling based on non-homogeneous hidden Markov models 347

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 17: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

season, conditional on low-frequency climate fore-casting and information for the Everglades NationalPark restoration, for the two major rainy seasons:May–July (MJJ) and August–October (ASO). Thissimulation is intended to capture sub-seasonal varia-bility. Therefore, we used the monthly regional rainfallas predicted by RVM in the simulation. The non-homogeneous hidden Markov model (NHHM) wasused to connect seasonal and monthly forecasts fromGCMs and observed prior-season climate conditionsto daily rainfall at several sites.

The NHMM, coupled with prediction models,shows promise as a technique for generating down-scaled daily rainfall-sequence scenarios for input into

hydrological operation models that require suchinputs, and may be competitive with sophisticatedweather-generator models designed for this purpose(Robertson et al., 2004).

We used the NHMMs to analyse precipitationamounts in the Everglades National Park (ENP)south Florida at eight gauging stations during the twomajor rainy seasons of MJJ and ASO for 1979–2000.A five-state model for MJJ and a four-state model forASO were chosen from the inspection of Bayesianinformation criteria (BIC) and log-likelihood (LL) ofthe rainfall data given in the model. The Viterbi algo-rithm was used to select the hidden states sequenceunderlying the observed data.

(a)

50°N

160°W 135°W 110°W 85°W 60°W

160°W 135°W 110°W 85°W 60°W

160°W 135°W 110°W 85°W 60°W

160°W 135°W 110°W 85°W 60°W

40°N

30°N

20°N

10°N

50°N

40°N

30°N

20°N

10°N

0°160°W 135°W 110°W 85°W 60°W

160°W 135°W 110°W 85°W 60°W

160°W 135°W 110°W 85°W 60°W

160°W 135°W 110°W 85°W 60°W

50°N

40°N

30°N

20°N

10°N

50°N

40°N

30°N

20°N

10°N

50°N

40°N

30°N

20°N

10°N

50°N

40°N

30°N

20°N

10°N

50°N

40°N

30°N

20°N

10°N

50°N

40°N

30°N

20°N

10°N

−2

−8

−6

−4

0

2

4

−40

−30

−20

−10

0

10

−4

−2

0

2

−4

−2

0

2

4

6

−2

0

2

4

6

8

10

−6

−4

−2

0

2

−6

−4

−2

0

2

4

−20

−15

−10

-5

0

5MJJ-1

MJJ-2

MJJ-3

MJJ-4

ASO-1

ASO-2

ASO-3

ASO-4

(b)

Fig. 15 (a) Composite plots of OLR (background) and wind vectors at 850h Pa for the four wettestMJJ weather states (1–4, fromtop to bottom). (b) Composite plots of OLR and wind vectors at 850 mb for the ASO weather states (1–4, from top to bottom).

348 Abedalrazq F. Khalil et al.

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 18: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

The NHMM were utilized successfully to inves-tigate the ability to downscale and to simulate theaverage seasonality based on monthly and seasonallyvarying predictors. There is growing evidence that theseasonality of precipitation advances and retreats inresponse to low-frequency quasi-oscillatory phenom-ena (Rajagopalan & Lall, 1995). These variationsreflect the wet and dry distribution. The NHMMapproach is better suited to capture such variations,owing to the fact that we condition precipitation onatmospheric information. To some extent, we wereable to capture the regional-scale spatial variabilityof precipitation.

In conclusion, a significant source of uncer-tainty in modelling precipitation patterns under cli-mate-change conditions is the uncertainty in futureclimate predictions. Currently, with future projec-tions provided by GCMs, the stochastic climatemodels can be adopted to generate reliable monthlyto seasonal sequences of precipitation for the newconditions.

The main drawback with NHMMs is the largenumber of parameters, in addition to the consistentunderestimation of variances of the simulated monthlyand seasonal totals. Thus, the results presented hereoffer more to the potential for a NHMM–climate pre-dictor based forecast than to the actual forecast skillthat could be achieved. We feel that the main contribu-tions are that insight is provided as to the wet seasonrainfall process and its potential predictability, pavingthe way for more exhaustive applications. We expectthat in future work we will address some of the short-comings with respect to seasonality, predictor selec-tion, and the lack of capturing spatio-temporalvariability, and we will explore cross-validation inperformance with a different set of predictors over alonger period of record.

REFERENCES

Bates, B. C., Charles, S. P. & Hughes, J. P. (1998) Stochastic down-scaling of numerical climate model simulations. Environ.Model. Software 13, 325–331.

Baum, L. E., Petrie, T., Soules, G. &Weiss, N. (1970) A maximizationtechnique occurring in the statistical analysis of probabilisticfunctions of Markov chains. Ann. Materials Statist. 41(1),164–171.

Bellone, E. J., Hughes, P. & Guttorp, P. (2000) Hidden Markov modelfor downscaling synoptic atmospheric patterns to precipitationamounts. Climate Res. 15, 1–12.

Carr, M. B. (1988) Determining the optimum number of predictors fora linear prediction equation. Monthly Weather Rev. 116(8),1623–1640.

Charles, S. P., Hughes, J. P., Bates, B. C. & Lyons, T. J. (1996)Assessing downscaling models for atmospheric circulation –local precipitation linkage. In: Proc. Int. Conf. on WaterResources and Environmental Research: Towards the 21stCentury, 269–276. (Water Resources Research Center, KyotoUniversity, Kyoto, Japan).

Hughes, J. P. & Guttorp, P. (1994a) A class of stochastic models forrelating synoptic atmospheric patterns to regional hydrologicphenomena. Water Resour. Res. 30(5), 1535–1546.

Hughes, J. P. & Guttorp, P. (1994b) Incorporating spatial dependenceand atmospheric data in a model of precipitation. J. Appl. Met.33(12), 1503–1515.

Hughes, J. P. & Guttorp, P. (1999) A non-homogeneous hiddenMarkov model for precipitation occurrence. J. Appl. Statistics48(1), 15–30.

Kaplan, A., Cane, M., Kushnir, Y., Clement, A., Blumenthal, M. &Rajagopalan, B. (1998) Analyses of global sea surface tempera-ture 1856–1991. J. Geophys. Res. 103, 18 567–18 589.

Khalil, A. F., Almasri, M. N.,McKee,M. &Kaluarachchi, J. J. (2005a)Applicability of statistical learning algorithms in ground waterquality modeling. Water Resour. Res. 41(5), W05010,doi:10.1029/2004WR003608.

Khalil, A. F., McKee, M., Kemblowski, M. & Tirusew, A. (2005b)Sparse Bayesian learning machine for real-time management ofreservoir releases. Water Resour. Res. 41(11), W11401,dio:10.1029/2004WR003891.

Khalil, A. F., McKee, M., Kemblowski, M., Asefa, T. &Bastidas, L. (2006) Multiobjective Analysis of chaotic sys-tems using sparse learning machines. Adv. Water Resour. 29(1), 72–88.

Kidson, J. W. & Watterson, I. G. (1995) A synoptic climatologicalevaluation of the changes in the CSIRO nine-level model withdoubled CO2 in the New Zealand region. Int. J. Climatol. 15,1179–1194.

Kwon, H.-H., Lall, U., Moon, Y.-I., Khalil, A. F. & Ahn, H. (2006)Episodic interannual climate oscillations and their influence onseasonal rainfall in the Everglades National Park.Water Resour.Res. 42, W11404, doi:10.1029/2006WR005017.

Legates, D. R. & McCabe, G. J. (1999) Evaluating the use of good-ness-of-fit measures in hydrologic and hydroclimatic modelvalidation. Water Resour. Res. 35(1), 233–241.

MacKay, D. J. (2003) Information Theory, Inference, and LearningAlgorithms. Cambridge University Press, Cambridge, UK.

Mehrotra, R. & Sharma, A. (2005) A nonparametric nonhomogeneoushidden Markov model for downscaling of multisite daily rainfalloccurrences. J. Geophys. Res. 110, D16108, doi:10.1029/2004JD005677.

Mehrotra, R., Sharma, A. & Cordery, I. (2004) Comparison of twoapproaches for downscaling synoptic atmospheric patterns tomultisite precipitation occurrence. J. Geophys. Res. 109,D14107, doi:10.1029/2004JD004823.

Mendes, J. M., Turkman, K. F. & Corte-Real, J. (2006) A Bayesianhierarchical model for local precipitation by downscaling large-scale atmospheric circulation patterns. Environmetrics 7(17),doi:10.1002/env.790.

Neal, R. (1994) Bayesian learning for neural networks. PhD Thesis,University of Toronto, Canada.

Palmer, T. N., Alessandri, A., Andersen, U., Cantelaube, P., Davey, M.,Délécluse, P., Déqué, M., Díez, E., Doblas-Reyes, F. J.,Feddersen, H., Graham, R., Gualdi, S., Guérémy, J.-F.,Hagedorn, R., Hoshen, M., Keenlyside, N., Latif, M., Lazar, A.,Maisonnave, E., Marletto, V., Morse, A. P., Orfila, B., Rogel, P.,Terres, J.-M. & Thomson, M. C. (2004) Development of aEuropean Multi-Model Ensemble System for Seasonal to Inter-Annual Prediction (DEMETER). Bull. Am. Met. Soc. 85(6),853–872.

Predictive downscaling based on non-homogeneous hidden Markov models 349

Downloaded By: [Columbia University] At: 17:00 14 February 2011

Page 19: Hydrological Sciences Journal Predictive downscaling based ...water.columbia.edu/files/2011/11/Lall2010Predictive .pdfPredictive downscaling based on non-homogeneous hidden Markov

Rabiner, L. (1989) A tutorial on hidden Markov models and selectedapplications in speech recognition. Proc. IEEE 77(2), 257–285.

Rajagopalan, B. & Lall, U. (1995) Seasonality of precipitation along ameridian in thewesternUS.Geophys. Res. Lett. 22(9), 1081–1084.

Robertson, A. W., Kirshner, S. & Smyth, P. J. (2003) Hidden Markovmodels for modeling daily rainfall occurrence over Brazil.Technical Report ICS-TR 03–27, Information and ComputerScience, University of California, Irvine, California, USA.

Robertson, A. W., Kirshner, S. & Smyth, P. (2004) Downscaling ofdaily rainfall occurrence over northeast Brazil using a hiddenMarkov model. J. Climate 17(22), 4407–4424.

Robock, A., Turco, R. P., Harwell,M.A., Ackerman, T. P., Andressen, R.,Chang, H.-S. & Sivakumar, M. V. K. (1993) Use of generalcirculationmodel output in the creation of climate change scenariosfor impact analysis. Climate Change 23, 293–335.

Roeckner, E., Arpe, K., Bengtsson, L., Christoph, M., Claussen, M.,DuKmenil, L., Esch, M., Giorgetta, M., Schlese, U. &Schulzweida, U (1996) The atmospheric general circulationmodel ECHAM4: model description and simulation of present-day climate. Report 23, Max-Planck-Institut für Meteorologie,Hamburg, Germany.

Stamus, P. A., Carr, F. H. & Baumhefner, D. P. (1992) Application of ascale-separation verification technique to regional forecast mod-els. Monthly Weather Rev. 120, 149–163.

Tipping, M. E. (2000) The relevance vector machine. In: Advances inNeural Information Processing Systems (ed. by S. Solla, T. Leen& K. R. Muller), vol. 12, 652–658. MIT Press, Cambridge,Massachusetts, USA.

Tipping, M. E. (2001) Sparse Bayesian learning and the relevancevector machine. J. Machine Learning 1, 211–244.

Tripathi, S., Srinivas, V. V. & Nanjundiah, R. S. (2006) Downscalingof precipitation for climate change scenarios: a support vectormachine approach. J. Hydrol. 330(3-4), 621–640.

Viterbi, A. J. (1967) Error bounds for convolutional codes and anasymptotically optimum decoding algorithm. IEEE Trans.Information Theory 13(2), 260–267.

Wilks, D. S. (1998) Multisite generalization of a daily stochasticprecipitation generation model. J. Hydrol. 210, 178–191.

Wilks, D. S. (1989) Conditioning stochastic daily precipitation modelson total monthly precipitation. Water Resour. Res. 23,1429–1439.

350 Abedalrazq F. Khalil et al.

Downloaded By: [Columbia University] At: 17:00 14 February 2011