framework for assessment of relative pollutant loads in streams with limited data

11
This article was downloaded by: [Northeastern University] On: 24 November 2014, At: 06:42 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Water International Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/rwin20 Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data Amin Elshorbagy a , Ramesh S.V. Teegavarapu b & Lindell Ormsbee b a IWRA University of Saskatchewan , Saskatoon, Canada b University of Kentucky , Lexington, Kentucky, USA Published online: 22 Jan 2009. To cite this article: Amin Elshorbagy , Ramesh S.V. Teegavarapu & Lindell Ormsbee (2005) Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data, Water International, 30:4, 477-486, DOI: 10.1080/02508060508691892 To link to this article: http://dx.doi.org/10.1080/02508060508691892 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Upload: lindell

Post on 28-Mar-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data

This article was downloaded by: [Northeastern University]On: 24 November 2014, At: 06:42Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Water InternationalPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/rwin20

Framework for Assessment of Relative Pollutant Loadsin Streams with Limited DataAmin Elshorbagy a , Ramesh S.V. Teegavarapu b & Lindell Ormsbee ba IWRA University of Saskatchewan , Saskatoon, Canadab University of Kentucky , Lexington, Kentucky, USAPublished online: 22 Jan 2009.

To cite this article: Amin Elshorbagy , Ramesh S.V. Teegavarapu & Lindell Ormsbee (2005) Framework for Assessment ofRelative Pollutant Loads in Streams with Limited Data, Water International, 30:4, 477-486, DOI: 10.1080/02508060508691892

To link to this article: http://dx.doi.org/10.1080/02508060508691892

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data

International Water Resources AssociationWater International, Volume 30, Number 4, Pages 477–486, December 2005

© 2005 International Water Resources Association

477

Framework for Assessment of Relative Pollutant Loads inStreams with Limited Data

Amin Elshorbagy, Member IWRA, University of Saskatchewan, Saskatoon, Canada,Ramesh S. V. Teegavarapu, and Lindell Ormsbee, University of Kentucky,

Lexington, Kentucky, USA

Abstract: A framework that integrates two data-driven techniques is proposed and developed toassess fecal coliform loadings in natural streams. A relationship between transport medium (streamflow)and non-conservative pollutant (fecal coliform) load is first developed using conventional regressiontechnique. The spatial distribution of the fecal load over watersheds is then captured using artificialneural networks through a disaggregation scheme. Streamflow, as a surrogate for non-conservativefecal load, has been used in the disaggregation process. The framework is applied to an area thatencompasses four USGS 8-digit Hydrologic Unit Code (HUC) watersheds in the southeastern region ofKentucky, USA. The study attempts to address two major issues: (i) assessment of relative pollutant loadsfrom watersheds and (ii) evaluation into possible reduction in the number of monitoring stations to meetthe budgetary constraints. Preliminary results indicate the potential of this approach in assessing therelative fecal loading contribution from different watersheds with the help of conservative hydrologicalparameters, especially in data-poor conditions.

Keywords: fecal coliform bacteria, spatial disaggregation, artificial neural networks (ANN), re-gression analysis, Kentucky

Introduction

The pollution of surface and groundwater with a vari-ety of pollutants that include pathogens continues to be amajor concern for state and federal agencies in the UnitedStates and around the world. Section 303(d) of the federalClean Water Act of the U.S. requires states to periodicallyprepare a list of all surface waters for which beneficialuses of the water – such as drinking, recreation, aquatichabitat, and industrial use – are impaired by pollutants.The 303(d) list identifies water bodies that do not meetspecific designated uses or the ambient water quality stan-dards set forward in Clean Water Act of 1972 and itsamendments. According to a recent report published byNational Research Council (NRC, 2001), approximately21,000 river segments, lakes, and estuaries have been iden-tified by states as being in violation of one or more waterquality standards. A large number of streams or segmentsof the streams throughout the United States have been placedunder the Section 303(d) list due to elevated concentrations offecal coliform bacteria.

Protection of natural streams or water bodies frompathogen contamination is essential when the designated

water uses are meant for recreation (primary and sec-ondary contact) and water supply. The presence of fecalcoliform bacteria in streams indicates possible existenceof pathogens that are harmful to human beings. The suit-ability of streams contaminated with fecal coliform bac-teria for purposes such as fishing and swimming is thereforequestionable. Detection along with assessment of the fateand transport of fecal coliform bacteria in natural streamsand groundwater has received increased attention in therecent past (Entry and Farmer, 2001). Fecal coliform ortotal coliform bacteria have been used in many researchstudies (e.g. Greenberg et al., 1992) as an indicator forassessment of water bodies for pathogen contamination.

Fecal contamination of streams or water bodies is oftencaused by point and non-point sources that include: ani-mal waste, failing septic systems, and sanitary sewer over-flows. Identifying sources of contamination and trackingthe movement of the fecal bacteria in streams is one ofthe most difficult tasks of water quality management. Manystate and federal agencies focus their efforts to detectand assess the sources of fecal contamination (Young andThackston, 1999) and to develop TMDLs (Total Maxi-mum Daily Loads) for water bodies as required by Sec-tion 303(d) of the Clean Water Act. The creation of a

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

06:

42 2

4 N

ovem

ber

2014

Page 3: Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data

478 A. Elshorbagy, R. Teegavarapu, and L. Ormsbee

IWRA, Water International, Volume 30, Number 4, December 2005

standard protocol for pathogen TMDL development(USEPA, 2001) can be seen as a current effort by UnitedStates Environmental Protection Agency (USEPA) in thisdirection. To develop a TMDL for any impaired stream,assessment of pollutants from the sources is essential.Therefore, assessment of the occurrence of fecal coliform,sources, and fate and transport in streams is also impor-tant from a TMDL or any other water quality manage-ment perspective.

Difficulties associated with modeling the occurrenceof fecal coliform bacteria include: (i) identification of thesources of contamination; (ii) quantification of the loadsfrom each of the sources; (iii) assessment of fate and trans-port of the pathogens in natural streams or water bodies,and (iv) estimation of hydrologic and watershed param-eters that influence the fecal loading in natural streams orwater bodies. Simple inductive models can provide valu-able insights into the processes without being highly pa-rameterized (Hodges, 1987; Craig et al., 2000). Recently,researchers have advocated the development and use ofsimple models (e.g., Bowie et al., 1985; Levin, 1985; Jianand Yu, 1998).

The aim of this paper is to develop a framework forassessing the relative fecal coliform bacteria load contri-bution of different watersheds for management purposesin data-poor conditions. The methodology relies mainly ondata-driven techniques that use available hydrologic pa-rameters to generate a record of fecal loading in thestreams. Besides addressing the issue of relative fecalcoliform bacteria load contribution, the proposed frame-work can help address some of the design and manage-ment aspects of water quality monitoring networks.Although the proposed framework cannot be perceivedas predictive modeling approach, it can be useful in wa-tershed and regional water quality management becauseof its ability to assess the relative fecal coliform bacteriaload contribution of different watersheds.

Modeling for Management

Inductive and deductive approaches have been usedin the past for water quality modeling studies (Chapra,1994). In many situations when availability of water qual-ity data is limited, inductive approach (data-driven or em-pirical) becomes essential to characterize the pollutantloadings. Many examples of application of inductive ap-proaches (e.g. Reckhow and Chapra, 1983; Jian and Yu,1998) are available in the literature. On the other hand,process-based models (e.g. QUAL2E, HSPF) are dataintensive, and additional effort is required on the part ofthe modeler for calibration and validation. Comprehensivemodeling software programs (e.g. USEPA [2001] knownas BASINS) that integrate a variety of process-basedmodels are now available. Although these modeling envi-ronments attempt to overcome the limitations associatedwith other modeling approaches, their data requirements

can be overwhelming. A number of physical and process-based parameter values are often required for such mod-eling and the reliability of the results is questionable in theabsence of large data sets. A good modeling start is whatis recommended by Chapra (2003), which is an adaptiveapproach starting with simpler models at the initial phasesand then progressing to more complex frameworks as ad-ditional data are collected.

Research on modeling fecal coliform bacteria con-centrations in water bodies is limited. This is primarily dueto spatial and temporal data limitations that hinder the pos-sibility of adopting process-based models. The need for amodeling framework that incorporates methodologies formodeling the spatial and temporal variability of pathogensin water bodies, using limited data, cannot be overempha-sized. The importance of such a framework can be fur-ther highlighted in light of one of the recommendations ofthe NRC (NRC, 2001) that emphasizes the need for mod-els that have data-filling abilities. The phrase “data-drivenmodels” used in this paper refers to conventional empiri-cal models that rely completely on, and are inferred from,raw data. The phrase also suggests that no functional re-lationships or rules are assumed prior to the developmentof these models.

Fecal Coliform Loading Assessment Framework

A framework is proposed in this paper to assess therelative contribution of the fecal coliform bacteria as anon-conservative pollutant (i.e., decays over time) fromdifferent watersheds using data-driven techniques. A sche-matic diagram of this framework is shown in Figure 1.Historical observations of streamflows (a measurable hy-drologic parameter) and fecal coliform concentration mea-surements are analyzed in order to develop functionalrelationships. This is achieved using regression. It shouldbe noted that these relationships need to be adaptivelymodified as and when new data become available to ad-dress the temporal variability of the pollutant loadings. Oncethis relationship is identified, a number of realizations ofthe hydrologic parameter (i.e. streamflows) can be gen-erated or synthesized using any data generation technique.In this paper, spatial disaggregation of the streamflows(Kumar et al., 2000) at a downstream gauging station intomultiple upstream flow series is used to illustrate this com-ponent of the framework. The disaggregated flows (up-stream series) are then used to provide continuous valuesof pollutant loads using the regression-based relationshipsdeveloped earlier. Streamflow dissaggregation techniquescan be extremely simple, such as the area-based disag-gregation method that is used in this study as a baselinedisaggregation method. However, artificial neural networks(ANNs), known for their function approximation capabilities,are also used and compared with the area-based method fordisaggregating the streamflows (Burian et al., 2001).

The idea of spatial disaggregation, as a part of the

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

06:

42 2

4 N

ovem

ber

2014

Page 4: Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data

Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data 479

IWRA, Water International, Volume 30, Number 4, December 2005

proposed framework, is also used as an approach to ad-dress the issue of the utility of different sampling stations.Successful disaggregation of flow measurements into in-dividual upstream locations, and hence obtaining pollutantloadings at these stations, indicates the possibility of elimi-nating, or at least minimizing, measurements at the up-stream locations. The data-driven approach to fecalcoliform load assessment as proposed in this paper doesnot rely on parameters such as decay coefficient, sunlight,temperature, and others that might be deemed necessaryfor analysis and for more sophisticated process-basedmodels. Although these parameters are important, linkingthe fecal loading to streamflows is considered sufficientfor the purpose of this paper, which is assessing the rela-tive load contribution of different upstream watersheds toa downstream point of concern in data-poor conditions.

The underlying concept of the disaggregation processis to break down an integral quantity to its individual com-ponents. The disaggregated quantity needs to be a con-servative substance so that the additivity criterion may holdvalid and can be mathematically verified. The streamflowis taken as a surrogate of the non-conservative parameter(fecal load). Therefore, the streamflows at a downstreamstation are spatially disaggregated to individual flows atupstream stations so that fecal loadings can be estimatedat the individual contributing streams at specific locationsusing the pre-determined inductive (regression) models.The proposed framework is applied to a study area cover-ing four watersheds in southeastern Kentucky.

Study Area: Southeastern Kentucky

The Eastern Kentucky PRIDE (Personal Responsi-bility in a Desirable Environment) project is the first com-prehensive, region-wide, local/state/federal cooperativeeffort designed to address the challenge of cleaning upthe region’s rivers and streams. The project focuses on 40counties located in the southeastern part of the Common-wealth of Kentucky that forms the headwaters of BigSandy, Licking, Kentucky, Green, and Cumberland Riverbasins. A significant number of streams in the PRIDE re-gion do not meet their designated use due to impairmentcaused by pathogens, nutrients, and pH. The impairmentdue to fecal coliform contamination is most likely causedby ineffective wastewater systems, such as bypass fromwastewater treatment plants, improperly operated, pri-vately-owned package plants, straight pipes (point dis-charges) discharging raw sewage directly into a receivingstream, failing septic systems, illegal dumps, and miningoperations (KWRRI, 2000). In the PRIDE study regionalone, it has been estimated that there are over 32,000straight pipes and failing septic systems.

One of the major problems that hinders water qualitymodeling and management efforts in general and also inthe PRIDE region specifically, is the lack of data on theeffluents from different pollution sources: point and non-point sources. Therefore, linking an observed environmentalviolation in a stream at a particular point in time to a spe-cific source is an extremely difficult task. Water qualityassessment of the streams and water bodies in the PRIDEregion is based on the use of data obtained from the Ken-tucky Division of Water (DOW) ambient network, Divi-sion of Water TMDL sampling stations, the KentuckyWatershed Watch Citizen Volunteer Monitoring network,and additional targeted sampling.

Various water quality data in the PRIDE region areavailable at several administrative levels. For example,information and statistics on straight pipes and failing sep-tic systems were collected on a county basis. Since inhydrologic studies, including water quality, modeling is usu-ally conducted on a hydrologic level, such as 11-digit and8-digit HUCs, a tool is needed to transform data and in-formation from one level to the other (e.g., from county toHUC) and between two different scales at the same level(e.g., from 8-digit HUC to 11-digit HUC). In the currentstudy this was accomplished using Geographic Informa-tion Systems (GIS). ArcGIS 3.2 is used to estimate thenumber of straight pipes and failing septic systems in aspecific watershed of interest. GIS layers of straight pipesand failing septic systems at county-levels are overlaid onGIS layers of 11-digit HUCs. The resulting layer is 11-digit HUCs with straight pipes and failing septic systems.The number of the straight pipes and failing septic sys-tems on each 11-digit HUC can be easily identified fromthe GIS-supporting database.

(Flow-load regression model)

Historical data on hydrologic parameters

Non-conservative pollutant measurements

Data-driven model

Hydrologic data synthesis

Non-conservative pollutant data synthesis

Water quality management

(Historical flows)

(Fecal coliform measurements)

(Flow disaggregation- NN model)

(Generation of fecal loading)

(TMDL)

Figure 1. Framework for non-conservative pollutant assessment us-ing a data-driven approach

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

06:

42 2

4 N

ovem

ber

2014

Page 5: Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data

480 A. Elshorbagy, R. Teegavarapu, and L. Ormsbee

IWRA, Water International, Volume 30, Number 4, December 2005

Data-driven Techniques

Data-driven techniques (e.g. regression analysis, arti-ficial neural networks, time series models) have been usedin the past for modeling hydrologic processes (Govindaraju,2000) and estimation of water quality parameters (Zhangand Stanley, 1997; Maier and Dandy, 2000). Data-drivenmodels can overcome many limitations associated withthe process-based models that are highly parameterized.Some of these models (e.g. ANNs) are referred to asblack-box tools since they use observed data to approxi-mate functional relationships among different parameters.

Regression AnalysisRegression analysis can be regarded as one of the

simplest forms of data-driven techniques. It can be usedto identify readily quantifiable relationships among differ-ent physical parameters of the systems. For example, flow ina stream can be linked to the fecal coliform load at that loca-tion. As long as the relationship is quantifiable and obtainedwith some judicious selection of the functional form, regres-sion analysis can replace other more complex data driventechniques. In cases when the functional form cannot be pre-defined, ANNs can be used as an effective alternative.

Artificial Neural Networks (ANNs)Artificial neural networks are mathematical models

that try to imitate the learning process of the human brainand learn through training with data sets. ANNs are non-parametric and nonlinear function approximators. The pa-rameters (internal connection weights) of ANNs can beobtained by trial and error or through numerical optimiza-tion. Discussion of network architectures suitable to vari-ous classes of problems with supervised and unsupervisedlearning algorithms is available in literature (e.g., Lippmann,1987; Wasserman, 1990). A comprehensive list of appli-cations of ANNs to a variety of water resources problemshas been recently compiled by Govindaraju (2000). ANNsare used in the present context to disaggregate hydrologicdata in space, which is referred to as multi-site disaggre-gation in hydrology literature. The results are compared tothose obtained by the simple area-based disaggregationmethod. Disaggregation in this study is primarily intendedto estimate/establish streamflow time series at three up-stream gauging stations utilizing the data at an immediatedownstream station.

In this paper, a feed-forward ANNs with back-propa-gation (FF/BP) learning algorithm is adopted. This studyemploys a three-layer (input, hidden, and output) network.The configuration of a neural network includes determin-ing the number of nodes in the hidden layer and the connec-tion weights. The network is made up of inter-connected setof simple information processing elements (nodes). Thenodes are arranged in a multi-layer system without anyconnections between the nodes of the same layer. Thenumber of nodes in the input layer is based on the number

of input arrays, while number of nodes in the output layeris equal to the number of the model outputs. The optimalnumber of nodes in the hidden layer can be achieved bytrial and error. The significance of the hidden layer nodesis that they add a degree of flexibility to the performanceof the network and enhance its capability to deal robustlyand efficiently with inherently complex non-linear relations(Shamseldin, 1997). Details on the ANNs and the BP train-ing algorithm can be found in Freeman and Skapura (1991).The number of output nodes in the ANN model in thecurrent application is equal to the number of upstream sub-basin flow stations plus one. The additional output is re-ferred to as sink node and is used to account for anydifference between the values of streamflows of the key(downstream) flow series and the summation ofstreamflows of the sub-basins (upstream) flow series. Theinput-output data pairs are fed to the neural network modelto accomplish the training process. In the training phasethe input data passes through the input and hidden layernodes and in this process get transformed via sigmoidal(S-shaped) functions and multiplied by connection weightsto generate outputs. The training algorithm uses the actualoutputs and available target values to modify the connec-tion weights and in essence executes the training algo-rithm for learning process. Once the training is complete,the network architecture (hidden layer nodes, weights) ispreserved and is used for testing.

Management of Water Quality MonitoringNetwork

Design of a water quality monitoring network is oneof the most important components of water quality model-ing and management. Insufficient sampling frequency andlocations may pose constraints on data analysis and ren-der the results inconclusive. On the other hand, redundantspatial and temporal sampling is indicative of poor engi-neering design and incurs unnecessary extra costs thatcan be better invested in other aspects of water qualitymanagement. The Eastern Kentucky PRIDE project isextended over five years, thus, efforts toward examiningthe existing network and modifying it to reach technicallyand economically optimal configuration can be invaluable.To arrive at such configuration, there is a need for reason-able assessment of the spatial variability or relative contri-bution of different watersheds in the region where base-linesampling is presently carried out.

Case Study Application

Daily streamflow data at four gauging stations in theKentucky River basin are used in this study. United StatesGeological Survey (USGS) streamflow gauging stations(USGS # 03282000, USGS # 03280000, USGS #03281000, and USGS # 03281500) are shown in Figure 2and represent four different 8-digit HUCs in the Kentucky

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

06:

42 2

4 N

ovem

ber

2014

Page 6: Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data

Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data 481

IWRA, Water International, Volume 30, Number 4, December 2005

River basin. The flow data series at USGS # 03282000 atHeidelberg (designated in Figure 2 as 4) is used as a Keytime series to be disaggregated into the three upstreamsub-basin flow series at the North Fork, Middle Fork, andSouth Fork (indicated as 1, 2, and 3 in Figure 2). The his-torical data of fecal coliform concentrations (count/100ml) are obtained from grab samples and are then used toestimate the instantaneous fecal loads (count/day) on thedays when the measurements were conducted. Grabsamples are collected mid-stream and at mid-depth of thestream channel and are stored at 10 Co while in transportto the lab. The samples are delivered to the lab within sixhours of collection. The standard analytical technique usedto measure fecal coliform is method 9222 D of the stan-dard methods for examination of water and wastewater(Greenberg et al., 1992). The sampling frequency (oncein a month) is pre-determined and samples at all stationsare collected at the same frequency.

Ten years of monthly fecal coliform data are used forthe analysis in this paper. Regression models are devel-oped to link the instantaneous fecal load to the streamflowat each of the three sub-series. Also, ten years (1991 to2000) of daily streamflows at the four stations is used fordisaggregation process. Two methods are adopted for dis-aggregating the streamflows of the key series. First, theflows are disaggregated in proportion to the areas of theupstream watersheds (as designated by USGS HydrologicUnit Code). Second, ANNs are used to seek better pres-ervation of the statistical properties of the disaggregatedflows at spatial and temporal levels. The first eight yearsof data were used to train (calibrate) the ANN modelsand the last two years were used to test (validate) theperformance of the ANN model and its success in the

disaggregation exercise. The architecture of the ANNmodel is shown in Figure 3. The available flow data aresplit into input-output data pairs for testing and training.

Analysis and Results

The first step in the modeling effort is directed to-wards developing functional relationships between dailyflow rates and the observed fecal coliform loads. Signifi-cant correlations are found between the streamflows andthe fecal coliform loads at the North Fork, Middle Fork,and South Fork sub-basins (Figure 4). Regression modelsare calibrated using 75 percent of the available recordsand validated using the remaining 25 percent of the data.Relative errors for calibration and validation data sets aregiven in Table 1. It is evident that the regression models’performance is consistent in both calibration and valida-tion periods, which implies a satisfactory level of reliability

Figure 2. Map of Kentucky and the case study area in southeastern Kentucky

Input node (Key series)

Four hidden nodes Three output nodes (Disaggregated sub-series)

Sink node

Figure 3. Neural network architecture for disaggregation ofstreamflows

N

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

06:

42 2

4 N

ovem

ber

2014

Page 7: Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data

482 A. Elshorbagy, R. Teegavarapu, and L. Ormsbee

IWRA, Water International, Volume 30, Number 4, December 2005

coliform loads at the three sub-basins, North Fork, MiddleFork, and South Fork, respectively can be modeled withthe help of regression analysis. The functional forms ob-tained are similar and are as follows:

(1a)

(1b)

(1c)

where FL is the fecal load (count/day) at the station underconsideration, and Q is the streamflow (ft3/s).

Once relationships are determined, the streamflowsat Heidelberg are disaggregated into the three sub-seriesflows based on the relative area of each sub-basin to thetotal watershed area contributing to Heidelberg and alsousing ANN model. The configuration of the ANN modelwith one input node, four hidden nodes, and four outputnodes was found to provide the best possible results fordisaggregating the flows at the key station. The ANN modelconfiguration (hidden nodes) was selected using a trial anderror method. As an indication of success of the ANNmodel in disaggregating the flows, the mean and standarddeviation of the disaggregated and measured flows at thethree stations are given in Table 2. It is evident from the tableand also from Figure 5 (only North Fork sub-basin, for brev-ity, is provided) that the ANN model performed better in pre-serving the statistical properties of the sub-basin streamflowseries than a simple area-based ratio method.

Since the overriding objective of the framework is tobe able to estimate average annual and monthly fecal loadsin the three sub-basins, the regression models developedearlier (Equations 1a, 1b, and 1c) are used to estimate thedaily fecal loads based on the disaggregated flows. The

0

2E+13

4E+13

6E+13

8E+13

1E+14

1.2E+14

0 200 400 600 800 1000 1200

Flow (f t3/s)

Fe

ca

l lo

ad

(co

un

t/d

ay

)

(a)

0

2E+13

4E+13

6E+13

8E+13

1E+14

1.2E+14

1.4E+14

1.6E+14

1.8E+14

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Flow (f t3/s)

Fec

al lo

ad (

cou

nt/d

ay)

(b)

0

1E+14

2E+14

3E+14

4E+14

5E+14

6E+14

7E+14

0 2000 4000 6000 8000 10000 12000 14000

Flow (f t3/s)

Fe

ca

l lo

ad

(c

ou

nt/d

ay

)

(c)

Figure 4. Flow-fecal load relationship: (a) North Fork, (b) MiddleFork, and (c) South Fork

in these models within the bounds of limitations of the cali-bration. The flow-load relationships suggest an increasein flow results in a fecal coliform load increase and viceversa. Apparently, increasing the flow rate leads to de-crease of travel time and decay rate, and therefore, in-crease in the fecal load in the stream (Thomann andMueller, 1987). However, exceptions of high fecal loadswith low flows are sometimes observed. This can be at-tributed to sudden release of effluents from point sources(e.g., wastewater treatment plants) in the study region.These exceptions can be easily identified as outliers inscatter graphs shown in the Figure 4a, b, and c. The fecal

Table 1. R2 (%) of Regression models as an indication of modelsperformance

North Fork Middle Fork South Fork

Calibration 68 75 74Validation 68 73 73

429.1910 QxFLN =545.17109 QxxFLM =

581.17109 QxxFLS =

0

2E+13

4E+13

6E+13

8E+13

1E+14

1.2E+14

1 2 3 4 5 6 7 8 9 10 11 12

Months

Fec

al lo

ad (

coun

t/day

)

Based on observed flow s

Based on NN-disaggregated f low s

Based on area-disaggregated flow s

Figure 5. Mean monthly fecal loading based on observed and disag-gregated flows using ANN model and area-based method (North Fork)

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

06:

42 2

4 N

ovem

ber

2014

Page 8: Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data

Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data 483

IWRA, Water International, Volume 30, Number 4, December 2005

mean annual and monthly fecal loads, estimated based onthe disaggregated flows, along with those estimated usingthe observed flows are shown in Figures 6 and 7, respec-tively. Average loads obtained from the model can be com-pared with the historical fecal loads. However, this comparison

is not carried out as the historical loads are instantaneousgrab samples sparsely collected within the year. The pro-posed analysis is intended to assess the relative load contribu-tion from the three sub-basins, for water quality managementpurposes, rather than prediction of individual values.

Table 2. Statistical properties of observed and disaggregated streamflows

Mean (ft3/s) Standard deviation (ft3/s)

Disaggregated % Error Disaggregated % Error

Sub-series Observed Area-based ANN Area-based ANN Observed Area-based ANN Area-based ANN

N-Fork 1116.4 1373 1098 23.1 1.6 1520.1 1984 1474 31 3M-Fork 694.3 670 675 3.5 2.8 991.4 967 958 2.5 3S-Fork 802.4 901 832 12.3 3.7 1445 1301 1242 10 14

0

5E+12

1E+13

1.5E+13

2E+13

2.5E+13

3E+13

3.5E+13

1999 2000

year

An

nu

al F

ec

al l

oa

d (

co

un

t/d

ay

)

Based on observed f low s

Based on disaggregated f low s

0

5E+11

1E+12

1.5E+12

2E+12

2.5E+12

3E+12

3.5E+12

4E+12

4.5E+12

1999 2000

Year

An

nu

al f

ec

al l

oa

d (

co

un

t/d

ay

)

Based on observed f low s

Based on disaggregated f low s

0

1E+12

2E+12

3E+12

4E+12

5E+12

6E+12

7E+12

8E+12

9E+12

1E+13

1999 2000

Year

An

nu

al f

ec

al l

oa

d (

co

un

t/d

ay

)

Based on observed f low s

Based on disaggregated f low s

Figure 6. Mean annual fecal loading based on observed and NN-baseddisaggregated flows (a) North Fork, (b) Middle Fork, and (c) South Fork

(c)

(b)

(a)

0

1E+13

2E+13

3E+13

4E+13

5E+13

6E+13

7E+13

1 2 3 4 5 6 7 8 9 10 11 12

Month

Fe

ca

l loa

d (

cou

nt/

da

y)

Based on observed f low s

Based on disaggregated f low s

0

1E+12

2E+12

3E+12

4E+12

5E+12

6E+12

7E+12

8E+12

9E+12

1E+13

1 2 3 4 5 6 7 8 9 10 11 12

Month

Fe

ca

l lo

ad (

co

unt/d

ay

)

Based on observed flow s

Based on disaggregated flow s

0

5E+12

1E+13

2E+13

2E+13

3E+13

3E+13

1 2 3 4 5 6 7 8 9 10 11 12

Month

Fe

ca

l lo

ad

(c

ou

nt/

da

y)

Based on observed f low s

Based on disaggregated f low s

(a)

(b)

(c)

Figure 7. Mean monthly fecal loading based on observed and NN-baseddisaggregated flows. (a) North Fork, (b) Middle Fork, and (c) South Fork

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

06:

42 2

4 N

ovem

ber

2014

Page 9: Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data

484 A. Elshorbagy, R. Teegavarapu, and L. Ormsbee

IWRA, Water International, Volume 30, Number 4, December 2005

The annual and monthly mean fecal loads are satis-factorily reproduced using the disaggregated flows. Thedifference between loadings based on observed and dis-aggregated flows is high for the month of January in theNorth Fork and the South Fork watersheds. This suggeststhat the disaggregation model has failed to capture thetemporal variability of the streamflows in that month forboth watersheds. As the loads are correlated tostreamflows, better prediction of the former is based onthat of the latter. The daily fecal coliform loads estimatedusing the disaggregated flows follow the same pattern ofthose estimated using the observed flows (Figure 8).

Discussion

In the present study, streamflow is used to estimatethe average fecal coliform loads. The success of the pro-posed approach is contingent upon two steps; first, develop-ing a reliable functional relationship between streamflows andpollutant load, and second, disaggregating the streamflows ofa key streamflow series into upstream sub-basin flow series.Seeking and developing appropriate data driven models thatcan efficiently handle the above-mentioned two steps willimpact the final results. Temporal changes in the watershedthat influence the fecal coliform loadings in a stream andmanagement practices that alter the fecal coliform loadingpatterns need to be considered when developing flow-loadrelationships. Regression analysis with an appropriate win-dow (a set of time intervals) of the sampled observationscan capture temporal changes in watershed and pollutantloading patterns.

The use of the ANN in the disaggregation process in thisstudy has helped in maintaining two major statistical proper-ties (mean and variance) of the flows and fecal loading of thedisaggregated sub-basin’s flow series. This can help assessthe relative pollutant load contributions of the watersheds,thus enabling better management practices to be in place.Also, the process can help in devising efficient monitoringnetworks by reducing sampling frequency and locations. Thecase study presented in the paper provides an example wherefecal coliform sampling can be minimized at three upstreamlocations of watersheds by disaggregating the downstreamload. The present study can also help in eliminating monitor-ing stations or a specific monitoring station from the samplingnetwork if budgetary constraints exist. For example, in thepresent study the disaggregation process suggests that theflows can be successfully spatially disaggregated and the sta-tions that are upstream of the gauging station USGS #03282000 (i.e. USGS # 03280000, USGS # 03281000, andUSGS # 03281500) are possible candidates for elimination, ifrequired. However, it should be noted that the success andapplicability of this elimination process depends on the per-formance of the disaggregation model. Also, it should be notedthat the proposed approach is for assessing average relativeloading rather than point prediction. Continuous generation ofthe pollutant loadings is also possible using the proposed frame-work that may help understand seasonal patterns in the pol-lutant loadings. This generation exercise is in tune withone of the recommendations of NRC (2001) to developmodels that enable filling the gaps in data and thereforeincreasing the efficiency of the monitoring and the accu-racy of the preliminary listing process of impaired streamstowards developing TMDLs.

The generality of the proposed framework makes it ap-plicable to other pollutants such as nutrients. Actually, adopt-ing this framework for conservative pollutants, rather thannon-conservative ones, could be even more successful sinceestablishing a flow-load relationship in the first case is easierthan the latter one.

0

1E+14

2E+14

3E+14

4E+14

5E+14

6E+14

0 50 100 150 200 250 300 350 400 450 500

Days

Fe

ca

l lo

ad

(c

ou

nt/

da

y)

Based on observed f low s

Based on disaggregated f low s

0

1E+13

2E+13

3E+13

4E+13

5E+13

6E+13

7E+13

8E+13

9E+13

0 100 200 300 400 500

Days

Fe

ca

l lo

ad

(c

ou

nt/

da

y)

Based on observed f low s

Based on disaggregated f low s

0

5E+13

1E+14

2E+14

2E+14

3E+14

3E+14

4E+14

4E+14

0 100 200 300 400 500

Days

Fe

cal

loa

d (

co

un

t/d

ay)

Based on observed f low s

Based on disaggregated f low s

(a)

(b)

(c)

Figure 8. Daily fecal loads estimated based on observed and disaggre-gated flows (1/1/1999-9/30/2000). (a) North Fork, (b) Middle Fork,and (c) South Fork

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

06:

42 2

4 N

ovem

ber

2014

Page 10: Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data

Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data 485

IWRA, Water International, Volume 30, Number 4, December 2005

Conclusions

A framework integrating two data driven techniques(ANNs and regression analysis) is developed to assessfecal coliform loadings in natural streams. The study re-ports a significant correlation between the flow and thefecal coliform loads in streams. An artificial neural net-work model, as a part of the framework, has helped tocapture the spatial variability of the transport medium(streamflow) and therefore assess the fecal coliform load-ing at a watershed scale. Results from the study can beused to reduce the sampling frequency and locations ofthe monitoring stations and assess relative fecal loadingsunder data-poor conditions. The methodology proposed isgeneral and can be applied to situations where water qualitymanagement agenda requires assessment of pollutant loads,and when use of process-based models is severely constrainedby lack of data and intensive calibration procedures.

Acknowledgements

The research work reported in this paper was sup-ported by grant sponsored by the Eastern KentuckyPRIDE. The data used in the study are provided by theKentucky Division of Water (DOW) and the KentuckyRiver Watershed Watch program. The authors would liketo acknowledge the help of Jason Booth and L.T. Yee fortheir assistance with the preparation of data. The first authoracknowledges the financial support of NSERC-Canada forits financial support through its Discovery Grant Program.

About the Authors

tucky. He is also the director of the Kentucky Water Re-sources Research Institute. He conducts research andpublishes in the areas of TMDLs and watershed manage-ment, surface water quality, water resources systems analy-sis, and soft computing techniques. He is a licensedProfessional Engineer and certified Professional Hydrolo-gist by the American Institute of Hydrology.

Discussions open until May 1, 2006.

References

Bowie, G. L., W.B. Mills, D.B. Porcella, C.L. Campbell, J.R.Pagenkopf, G.L. Rupp, K.M. Johnson, P.W.H. Chan, S.A.Gherini, and C.E. Chamberlin. 1985. Rates, constants, andkinetic formulations in surface water quality modeling. EPA/600/3-85/040. Washington, D.C.: U.S. Environmental Protec-tion Agency.

Burian, S.J., S. R. Durrans, S. J. Nix., and R. E. Pitt. 2001. “Train-ing artificial neural networks to perform rainfall disaggrega-tion.” Journal of Hydrologic Engineering 6, No. 1: 43-51.

Chapra, S.C. 2003. “Engineering Water Quality Models andTMDLs.” Journal of Water Resources Planning and Man-agement: 247-256.

Chapra, S. C. 1994. Surface Water Quality Modeling. Boston:WCB/McGraw-Hill.

Craig, A.S., M.E. Borsuk, and K.H. Reckhow. 2000. “NitrogenTMDL Development in the Neuse River Watershed: An Im-perative for Adaptive Management.” Project report, Envi-ronmental Sciences and Policy Division, Nicholas School ofthe Environment and Earth Sciences. Durham, NC: DukeUniversity.

Entry, J. A. and N. Farmer. 2001. “Movement of Coliform Bacte-ria and Nutrients in Ground Water Flowing through Basaltand Sand Aquifers.” Journal of Environmental Quality 30:1533-39.

Freeman, J. A. and D.M. Skapura. 1991. Neural Networks. Algo-rithms, Applications, and Programming Techniques. Read-ing, MA: Addison-Wesley Inc..

Greenberg A. F., L. S. Clesceri, and A. D. Eaton. 1992. StandardMethods for the examination of water and wastewater. 18th

ed. Washington, D.C.: American Public Health Association.Govindaraju, R.S. 2000. “Artificial neural networks in hydrol-

ogy. II: Hydrologic applications.” Journal of Hydraulic Engi-neering 5, No 2: 124-37.

Hodges, J.S. 1987. “Uncertainty, policy analysis, and statistics(with discussion).” Journal of American Statistical Asso-ciation 2: 259-91.

Jian, X. and Y.S. Yu. 1998. “A comparative study of linear andnonlinear time series models for water quality.” Journal of theAmerican Water Resources Association 34, No. 3: 651-59.

Kentucky Water Resources Research Institute (KWRRI). 2000.PRIDE water quality assessment report: I. Problems andprograms. Lexington, KY, USA: University of Kentucky.

Kumar, D.N., U. Lall, and M.R. Petersen. 2000. “Multisite disag-gregation of monthly to daily streamflow.” Water Resources

Amin Elshorbagy is an Assistant Pro-fessor of Water Resources Engineering atthe Department of Civil & Geological Engi-neering, University of Saskatchewan. Heestablished the Centre for Advanced Nu-merical Simulation (CANSIM) where hisgraduate students are hosted. Through

CANSIM, he conducts research and has published sev-eral articles in the areas of soft computing in water re-sources, statistical hydrology, decision analysis, systemdynamics, and TMDLs and watershed management. Heis a licensed Professional Engineer (Ontario andSaskatchewan) and certified Professional Hydrologist bythe American Institute of Hydrology.

Ramesh Teegavarapu is a Research Associate atthe Kentucky Water Resources Research Institute, Uni-versity of Kentucky. He conducts research in the areas ofsoft computing in water resources and surface water qual-ity management. He is a licensed Professional Engineer.

Lindell Ormsbee is a Professor of water resourcesat the Civil Engineering Department, University of Ken-

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

06:

42 2

4 N

ovem

ber

2014

Page 11: Framework for Assessment of Relative Pollutant Loads in Streams with Limited Data

486 A. Elshorbagy, R. Teegavarapu, and L. Ormsbee

IWRA, Water International, Volume 30, Number 4, December 2005

Research 36, No. 7: 1823-33.Levin, S.A. 1985. “Scale and predictability in ecological model-

ing.” In T.L. Vincent, Y. Cohen, W.J. Grantham, G.P. Kirkwood,and J.M. Skowronski, eds. Modeling and management ofresources under uncertainty. Lecture notes in Biomathemat-ics No. 72. Berlin: Springer-Verlag. 1-8.

Lippmann, R.P. 1987. “An introduction to computing with neu-ral nets.” IEEE ASSP Magazine: 4-22.

Maier, H. R., and G.C. Dandy. 2000. “Application of artificialneural networks to forecasting of surface water quality vari-ables, applications and challenges.” In R. S. Govindaraju,and A. Ramachandra Rao, eds. Artificial Neural Networksin Hydrology. Dordrecht: Kluwer Academic Publishers.

NRC. 2001. Assessing the TMDL Approach to Water QualityManagement. Washington, D.C.: National Academy Press.

Reckhow, K. H., and S.C. Chapra. 1983. Engineering Approachesfor Lake Management, Vol. 1: Data Analysis and EmpiricalModeling. Woburn, MA: Butterworth.

Shamseldin, A. Y. 1997. “Application of a Neural Network Tech-nique to Rainfall-Runoff Modeling.” Journal of Hydrology199: 272-94.

Thomann, V.R. and J.A. Mueller. 1987. Principles of SurfaceWater Quality Modeling and Control. New York: HarperCollins.

USEPA, 2001. Better Assessment Science Integrating Point andNon-point Sources, BASINS, Version 3.0 User’s Manual.Washington, D.C.: U.S. Environmental Protection Agency,Office of Water.

Wasserman, P.D. 1990. Neural computing. New York: VanNostrand Reinhold.

Young, K. D. and E. L. Thackston. 1999. “Housing Density andBacterial Loading in Urban Stream.” Journal of Environmen-tal Engineering 125, No. 12: 1177-80.

Zhang, Q. and Stanley, S.J., 1997. “Forecasting raw-water qual-ity parameters for the North Saskatchewan River by neuralnetwork modeling.” Water Research 31, No. 9: 2340-50.

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

06:

42 2

4 N

ovem

ber

2014