detectability of step trends in the rate of atmospheric deposition of sulfate

12
VOL. 21, NO. 5 WATER RESOURCES BULLETIN AMERICAN WATER RESOURCES ASSOCIATION OCTOBER 1985 DETECTABILITY OF STEP TRENDS IN THE RATE OF ATMOSPHERIC DEPOSITION OF SULFATE' Robert M. Hirsch and Edward J. Gilroy2 ABSTRACT: A method is presented to assist policy makers in deter- mining the combination of number of sampling stations and number of years of sampling necessary to state with a given probability that a step reduction in atmospheric deposition rates of a given magnitude has occurred at a pre-specified time. This pre-specified time would typically be the time at which a sulfate emission control program took effect, and the given magnitude of reduction is some percentage change in deposition rate one might expect to occur as a result of the emission control. In order to determine this probability of de- tection, a stochastic model of sulfate deposition rates is developed, based on New York State bulk collection network data. The model considers the effect of variation in precipitation, seasonal variations, serial correlation, and site-to-site (cross) correlation. A nonparametric statistical test which is well suited to detection of step changes in such multi-site data sets is developed. It is related to the Mann-Whitney Rank-Sum test. The test is used in Monte Car10 simulations along with the stochastic model to derive statistical power functions. These power functions describe the probability of detecting (&0.05) a step trend in deposition rate as a function of the size of the step-trend, record length before and after the step-trend, and the number of sta- tions sampled. The results show that, for an area the size of New York State, very little power is gained by increasing the number of stations beyond about eight. The results allow policy makers to deter- mine the tradeoff between the cost of monitoring and time required to detect a step-trend of a given magnitude with a given probability. (KEY TERMS: precipitation chemistry; statistics; trends; network de- sign.) INTRODUCTION As actions are contemplated or taken to reduce the emis- sions of sulfates and other compounds to the atmosphere, the question of detectability of the resulting changes in deposi- tion rates becomes important. Policy makers need to have an understanding of the amount of time they can expect to wait after emissions are reduced before statistically con- vincing evidence is accumulated which will demonstrate that a change in deposition has occurred. Without such knowledge they may be over-anxious and modify an effective plan be- cause the effects were not apparent soon enough or, con- versely, they may wait too long to modify an ineffective plan because they continue to hope to observe a substantial effect long after any such effect would be detectable. Policy makers also need an understanding of the tradeoffs between the amount of data collected and the size of trend that can be detected with a given probability so that they can make ap- propriate decisions about expenditures for monitoring net- works. The amount of data refers both to the number of collection sites and the length of time they are sampled both before and after the change occurs. The detection of the step trend is defined as the rejection of the null hypothesis of no step trend over the monitoring area, at some prespecified signifi- cance level (a), using some appropriate statistical test. Statis- tical power is defined as the probability that the test will re- ject the null hypothesis (detect a trend), and is a function of a, the number of stations, number of years, size of the step trend, probability distribution of the data, and their variances and covariances. The results of this investigation into the detectability of step trends are potentially useful in two ways. They provide an indication of the length of time it will take, given a parti- cular network design and magnitude of step trend, to be nearly certain of being able to detect the trend (that is, to achieve a statistical power approaching 1 .O). Conversely, it provides guidance for investing additional resources in data collection in order to hasten the time at which detection of a step trend of a specified magnitude becomes a near cer- tainty. One of the questions the analysis can answer is, to what extent can one substitute space for time in this kind of data colIection effort? That is, how does the addition of more data collection sites influence the amount of time re- quired to detect the change? The steps to be followed in this study are the following: 1) analysis of some sulfate deposition data resulting in a set of summary statistics describing the relationship between deposition rates and rainfall rates, and the distributional pro- perties of the residuals from this relationship including means, variances and covariances (over space and time); 2) adoption of a simulation procedure for producing artificial data sets for use in step 4 below which preserve certain key properties of the observed data sets; 3) selection and modification of a 'Paper No. 85100 of the Water Resources Bulletin. Discussions are open until June 1, 1986. Respectively, Hydrologist and Mathematical Statistician, U.S. Geological Survey, 410 National Center, Reston, Virginia 22092. 773 WATER RESOURCES BULLETIN

Upload: robert-m-hirsch

Post on 23-Jul-2016

212 views

Category:

Documents


1 download

TRANSCRIPT

VOL. 21, NO. 5 WATER RESOURCES BULLETIN

AMERICAN WATER RESOURCES ASSOCIATION OCTOBER 1985

DETECTABILITY OF STEP TRENDS IN THE RATE OF ATMOSPHERIC DEPOSITION OF SULFATE'

Robert M. Hirsch and Edward J. Gilroy2

ABSTRACT: A method is presented to assist policy makers in deter- mining the combination of number of sampling stations and number of years of sampling necessary to state with a given probability that a step reduction in atmospheric deposition rates of a given magnitude has occurred at a pre-specified time. This pre-specified time would typically be the time at which a sulfate emission control program took effect, and the given magnitude of reduction is some percentage change in deposition rate one might expect to occur as a result of the emission control. In order to determine this probability of de- tection, a stochastic model of sulfate deposition rates is developed, based on New York State bulk collection network data. The model considers the effect of variation in precipitation, seasonal variations, serial correlation, and site-to-site (cross) correlation. A nonparametric statistical test which is well suited to detection of step changes in such multi-site data sets is developed. It is related to the Mann-Whitney Rank-Sum test. The test is used in Monte Car10 simulations along with the stochastic model to derive statistical power functions. These power functions describe the probability of detecting (&0.05) a step trend in deposition rate as a function of the size of the step-trend, record length before and after the step-trend, and the number of sta- tions sampled. The results show that, for an area the size of New York State, very little power is gained by increasing the number of stations beyond about eight. The results allow policy makers to deter- mine the tradeoff between the cost of monitoring and time required to detect a step-trend of a given magnitude with a given probability. (KEY TERMS: precipitation chemistry; statistics; trends; network de- sign.)

INTRODUCTION As actions are contemplated or taken to reduce the emis-

sions of sulfates and other compounds to the atmosphere, the question of detectability of the resulting changes in deposi- tion rates becomes important. Policy makers need to have an understanding of the amount of time they can expect to wait after emissions are reduced before statistically con- vincing evidence is accumulated which will demonstrate that a change in deposition has occurred. Without such knowledge they may be over-anxious and modify an effective plan be- cause the effects were not apparent soon enough or, con- versely, they may wait too long to modify an ineffective plan because they continue to hope to observe a substantial effect long after any such effect would be detectable. Policy makers

also need an understanding of the tradeoffs between the amount of data collected and the size of trend that can be detected with a given probability so that they can make ap- propriate decisions about expenditures for monitoring net- works.

The amount of data refers both to the number of collection sites and the length of time they are sampled both before and after the change occurs. The detection of the step trend is defined as the rejection of the null hypothesis of no step trend over the monitoring area, at some prespecified signifi- cance level (a), using some appropriate statistical test. Statis- tical power is defined as the probability that the test will re- ject the null hypothesis (detect a trend), and is a function of a, the number of stations, number of years, size of the step trend, probability distribution of the data, and their variances and covariances.

The results of this investigation into the detectability of step trends are potentially useful in two ways. They provide an indication of the length of time it will take, given a parti- cular network design and magnitude of step trend, to be nearly certain of being able to detect the trend (that is, to achieve a statistical power approaching 1 .O). Conversely, it provides guidance for investing additional resources in data collection in order to hasten the time at which detection of a step trend of a specified magnitude becomes a near cer- tainty. One of the questions the analysis can answer is, to what extent can one substitute space for time in this kind of data colIection effort? That is, how does the addition of more data collection sites influence the amount of time re- quired to detect the change?

The steps to be followed in this study are the following: 1) analysis of some sulfate deposition data resulting in a set of summary statistics describing the relationship between deposition rates and rainfall rates, and the distributional pro- perties of the residuals from this relationship including means, variances and covariances (over space and time); 2) adoption of a simulation procedure for producing artificial data sets for use in step 4 below which preserve certain key properties of the observed data sets; 3) selection and modification of a

'Paper No. 85100 of the Water Resources Bulletin. Discussions are open until June 1, 1986. Respectively, Hydrologist and Mathematical Statistician, U.S. Geological Survey, 410 National Center, Reston, Virginia 22092.

773 WATER RESOURCES BULLETIN

Hirsch and Gilroy

statistical testing procedure for detecting step changes in such data sets; 4) derivation of statistical power functions through Monte Carlo simulation relating power to the magnitude of the change, number of data collection sites, and length of data collection before and after the change; and 5) the trans- lation of those power functions to some tables suitable for the network design questions that are likely to arise.

DATA ANALYSIS

The first step in the study is to analyze the data from an existing precipitation quality network in order to develop a general stochastic model of the deposition process. The data set selected for use in this study is from the New York State precipitation-chemistry monitoring network operated by the U.S. Geological Survey in cooperation with the New York State Department of Environmental Conservation. The rec- ords are for the period 1965 through 1981 at five sites (see Figure 1). The samples are from bulk collectors (Wlxte- head and Feth, 1964) which contain a composite of wet and dry fallout. The samples were collected at an approximately monthly frequency. The data used in this study are the sul- fate concentrations and precipitation quantities from each of these collected samples.

Figure 1. Location of Precipitation Sites in New York State Used in this Report: A - Allegany, C - Canton,

H - Hmkley, MP - Mays Point, U - Upton.

The data were screened for obvious errors as described by Peters (1984) and processed to produce a monthly time series of precipitation rates in centimeters per day and sulfate deposition rates in grams per square meter per day. This was done by the following rule: for every collected sample, a mid- point date (half way between the beginning and ending dates for the sample collection) was determined and the sample was assigned to the month in which that midpoint occurred.

If more than one sample had its midpoint in a given month, their precipitation amounts were added and sulfate loads (con- centration times precipitation amount) added and the precipi- tation rates and loading rates computed from these. The pre- cipitation rate is the total precipitation amount divided by the length of the collection(s) in days. The sulfate loading rate is the total sulfate load divided by the length of the collection(s) in days. If no sample had midpoints within the month, then the month was considered to be missing. The number of “non-missing” months in each record is given in Table 1; they average 170 months (about 14 years) per sta- tion. Additional descriptions of the data collection program and other studies of these data include (Pearson and Fisher, 1971; Peters, et al., 1982).

Examination of the relationship between precipitation rates and deposition rates at each of the stations indicates that the two quantities are related to each other and that a good form for expressing this relationship is

where In is the natural log, L is the sulfate loading rate ((gm/m2)/d) where d is number of collection days, P is pre- cipitation rate (cm/d), 00 and 01 are coefficients; and E is a residual (independent of In P). Figure 2 shows one typical scatter plot. Based on previous studies (Peters, ea at., 19821, it was known that time trends are apparent in these data. Given the purpose of this paper - to describe the detectability of a step trend - it was desirable tto build a statistical model which would “remove” the hstorical trend from the data as well as remove the deterministic effects of rainfall quantity variations. The model used was

In L = Po t Plln P t P2 T + E (2)

where the variables are as described above and T is equal to time in months with T = 1 for January 1965 (T = 228 for December 1983). This relationship was fit by ordinary least squares (OLS) for each of the seven stations. The estimates of Po, 01, and 02 (denoted bo, b l , and b2), the t statistic for b2, the standard error, S ( E ) , of the regression, and R2 for the regression are all given in Table 1. The t statistics for bo and bl were a l l highly significant (a! = 0.001) and therefore not listed.

It is worth noting that the resulting prediction equation may be expressed as

A bo b l b2T L = e P e (3)

where 2 is a median estimator of the loading (an estimate for which the probability of overestimation and underestima- tion each equal 1/2) if E can be considered normally dis- tributed.

Note that the bl values fall in the range of 0.59 to 0.72 (mean b l = 0.67). For b l = 0.67 a 10 percent higher rain- fall rate would produce a 6.6 percent higher sulfate load. This

774 WATER RESOURCES BULLETIN

Detectability of Step Trend in the Rate of Atmospheric Deposition of Sulfate

TABLE 1. Sample Statistics for Sulfate Deposition Data at Five New York Stations (see text for definitions of variables).

Allegany 170 -2.0 0.72 -0.0023 -2.80 .33 .50 .46 -.21 .21

Hinckley 164 -2.0 0.72 -0.0027 -2.95 .34 .46 .36 -.48 .26 Mays Point 183 -2.1 0.62 -0.0017 -2.54 .37 .42 .3 1 -.46 .ll Upton 16 1 -2.6 0.59 -0.0024 -3.06 .33 .SO .46 -.54 .14

Mean 170 -2.1 0.67 -0.0025 -3.31 .36 ,47 .39 -.39 .14 Standard Deviation 9 0.6 0.06 0.0007 1.08 .05 .03 .07 .15 .ll

Canton 174 -1.3 0.70 -0.0036 -5.22 .44 .45 .38 -.24 -.03

MAYS PO I NT

t. 0

2 U cn

a \

\

(3 7 W

K 7 0 Fz c/ ) 0

- k

LL W

+

+

+ + + +

*++++ + * + * **.s+++

+ .a-W*+$* s++*+w + +$++ +

+ +p ++ ++ + + + +q+* +

+ $+*+ +* + +$+

+

+ + # ++f$ $*+ +**4?

+ + + + + +

+ + +

+

+

+

PRECIPITATION RATE IN CM/DAY Figure 2. Scatter Plots Showing Sulfate Deposition Rate in g/m2/day Versus

Precipitation Rate in cm/day for a Typical New York Station Analyzed.

775 WATER RESOURCES BULLETIN

Hirsch and Gilroy

model implies that concentrations of sulfate are proportional

to P , so that for b l = 0.67 a 10 percent higher precipi- tation rate would produce a 3.1 percent lower concentration.

The t statistics on b2 (the time trend coefficients) indicate that 02 is different from zero at a two-sided a-level of 0.02 at all of the stations. The interpretation of the t statistics is somewhat clouded by the existence of some serial correla- tion in the data, but the existence of missing values makes the kind of treatment suggested by Box and Tiao (1975) unworkable. Because the purpose of this data analysis is not the demonstration of the existence of trend in these data, but rather the removal of readily explained sources of variation, the t statistics suggest enough evidence of trend to justify the removal of the trend effect. The b2 values range from -0.0017 to -0.0036 (mean of -0.0025). They indicate a

downward trend of (e -1) 100 percent per year. This

bl-1

b2912

suggests a downward trend over all five stations at -3.0 per- cent per year for the years 1965-1981.

Examination of the residuals (e values) for the OLS fit of Equation (2) revealed seasonality in the residuals at all of the stations. Figure 3 shows boxplots of the residuals by month for the Mays Point station. The other four stations show similar seasonal patterns. Figure 3 indicates that, for a given rainfall rate, summer sulfate deposition rates are substantially higher than winter deposition rates. For example, May, June, July, and August have average detrended precipitation-adjusted sulfate deposition rates of about 43 percent above the annual mean, and October, November, and December have rates about 28 percent below the annual mean. The seasonal pat- tern in these data has been noted previously by Barnes, et al. (1982). This type of pattern has also been noted in other data sets by Raynor and Hayes (1982), Lioy and Morandi (1982), and Tanner and Leaderer (1982). The standard

BOX PLOTS BY MONTH 1

0.5

0

-0.5

-1

-1.5

....................................................................................... 111" ....... 1

........................................................................................................ ..................................... I L I I 1 1 1 I I I I I I I I

JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC

Figure 3. Boxplots Showing Medians, Quartiles, and Extremes of the Residuals, by Month, at Mays Point.

WATER RESOURCES BULLETIN 776

Detectability of Step Trend in the Rate of Atmospheric Deposition of Sulfate

deviations follow no such regular seasonal variation over all the stations, as do the means. A modeling decision was made that the seasonal cycle in the residuals could be removed by subtracting the station-monthly means from all the E values. These standardized values are referred to as deseasonalized residuals, E*. Table 1 shows the station standard deviations before and after that standardization, S ( E ) and S(E*) , respec- tively.

In order to build a stochastic model of these data, it is necessary to determine the shape of the marginal distribu- tions, the serial dependence, and the between-station de- pendence. The statistics used to characterize these are the sample skewness, r(e*), the sample lagone serial correlation coefficients, r 1 (E*) , and the lag-zero cross correlation coeffi- cients ro(e{,$). All five stations show significant negative skewness (a=O.Ol). The mean is -0.39. Four out of five of the lag 1 serial correlation coefficients are positive and, al- though only two of them are significantly different from zero (~0 .05 ) , as a group they do significantly depart from zero. The mean of r1 is 0.14, and the standard error of the mean is 0.05. The sample auto correlation functions (ACF) for these data were examined out to 36 lags, and nothing in these ACF’s would suggest the need for a model more com- plex than an AR(1) model (Box and Jenkins, 1976). Given that the lag 1 correlations are very low compared to the standard errors of serial correlation coefficients, any diag- nostic pattern suggesting some other type of model would be obscured by the noise. Furthermore, the existence of considerable missing data makes it impossible to use the standard Box and Jenkins (1976) model identification and estimation techniques.

The cross correlations (ro(E{,ei)) for stations k and h of the E* records at each of the 10 possible station pairs were computed. They had a mean of 0.31, and standard deviation of 0.14. Figure 4 shows the relationship between ro(e&,Eg) and the distance between the stations (dkh). The correlation coefficient between ro(e{,€$) and dkh is 0.74.

SYNTHETIC RECORD GENERATION The objective of the simulations to be carried out in this

study is to produce synthetic records of deseasonalized re- siduals (€*) which have the statistical properties that one would expect to find in a set of observations from an arbi- trary network of ns stations over the State of New York. The replication of the statistics of these particular stations is not of interest. Thus, the model to be built will be based on the averages of the sample statistics of the deseasonalized residuals ( E * ) over the stations and, thus, will not represent any one station but rather a set of “typical” stations.

For any given network size (ns) the stations are assumed to be randomly located over the area with the restriction that no two stations in the network be closer together than some prespecified minimum distance. For purposes of the simula- tion the area was taken to be a square 500 km on a side. The minimum distances were specified to be 250, 200, 150, and

100 km for networks of 2 , 4 , 8 , and 16 stations, respectively. The algorithm for locating the stations was to sequentially generate a set of x,y coordinates from two independent uni- form random numbers, on the range (0, SOO), and check each station to determine if it violates the minimum distance constraint. If it does not violate the constraint, it is retained in the network (if it does violate, it is dropped) and a new station is generated. This process continues until the net- work contains ns stations. The minimum distances were se- lected to be about as large as possible consistent with there being essentially a zero probability that the algorithm would fail to produce a feasible network.

This algorithm should result in networks not unlike actual atmospheric deposition networks. Regular optimally spaced grid patterns are generally not possible in practice because of the rather stringent location requirements (distance from point sources). Locating stations in very close proximity to each other would probably not occur out of recognition of the redundancy of the information from two nearby stations.

The cross correlations among all of the stations PO(E&,E$) in a given randomly generated network are determined from the interstation distances according to the equation

po(~$,eg) = 0.644-0.00107 dkh

This equation was fit to the data by the line of organic cor- relation (I-hrsch and Gilroy, 1984; Kermack and Haldane, 1950) and is intended to result in a set of sample cross corre- lations ro(Eg,Eg) which have both the same mean and same variance as occurred in the original ro values given the original set of interstation distances. This LOC fit is shown with the data in Figure 4. The verification of this technique of generating the interstation correlations is considered below.

The stochastic model which is used to generate synthetic traces of these data fields is the multivariate lognormal generating process described by Matalas (1967, pp. 940-944). It is parameterized by the variance of the process (0.15 = .39*), the skewness (-0.39), the lag 1 serial correlation ( ~ ( 1 ) = 0.14), and the lag zero cross correlations, PO(E{,E$), generated as described above. The generator produces syn- thetic sequences of monthly data for ns stations which are log normally distributed with population mean of zero, and population variance, skewness, and lag one serial and lag zero cross correlations which are equal to the above-mentioned values. The monthly means could be added back to these values in order to make them surrogates for the residuals E

rather than the deseasonalized residuals, E*. As we shall see below, this is not necessary because of the nature of the trend test used. The structure of the model also determines the multiple lag serial and cross serial correlations.

The process for generating the interstation correlations was tested by generating 400 sets of records for five stations for 14 years (the same as the actual data set). The ten lag zero cross correlations were computed from each record of 14 years. The overall mean of the 4000 lag zero cross correlations was found to be 0.294 and their standard deviation was 0.137,

777 . WATER RESOURCES BULLETIN

Hirsch and Gilroy

CORRELATION VS. DISTANCE

0.6

0.5

0.4

0.3

0.2

0.1

0.0

A

~

LEGEND A OBSERVED DATA

A

A

100 150 200 250 300 350 400 450 500 550 600

INTERSTATION DISTANCE IN KM Figure 4. Scatter Plot Showing the Cross Correlation Coefficients, ro (eP i ) , of the Deseasonalized Residuals

of organic correlation (LOC) used to generate interstation correlations for the simulations. for Stations k and h, in Relation to the Distance Between the Statlons, dkh. Also shown is the line

which are approximately equal to the sample values for the actual data (0.312, 0.141).

Thus, the method of setting cross correlation coefficients based on interstation distances appears to be reasonable and is therefore adopted for computing the cross correlations for the randomly generated sets of station locations. The mean and the standard deviation of the population cross correla- tions, pO($,$), increase as ns increases. The means are 0.27, 0.29, 0.31, and 0.33 and the standard deviations are 0.08, 0.10, 0.12, and 0.13 for networks of 2 ,4 , 8, and 16 stations, respectively. As networks increase in size, there are increasing numbers of stations quite close together and quite far apart

but, overall, average distances decrease with increasing num- ber of stations. This factor limits the growth in the informa- tion content obtained as one increases network size.

TESTING

One of the assumptions implicit in the data analysis was that, whatever statistical test is used to detect step trends, it should treat the data in such a way as to remove that portion of the variance in sulfate loading rates which is due to the variance in precipitation rates, and also to remove the variance due to seasonally varying mean values for the process. These

778 WATER RESOURCES BULLETIN

Detectability of Step Trends in the Rate of Atmospheric Deposition of Sulfate

are both sources of variation which obscure the variation of interest, that due to a shift in the mean level of the process, presumably as a result of sulfate emission controls. In fact, given a particular network, record length and desired levels of statistical power, the magnitude of the step trend (in log units) that can be detected is linearly related to the trend- free standard deviation (in log units) of the data being tested. If one did not remove the effects of either precipitation rate or seasonally varying means, then the average sample standard deviation over all five stations would be 0.57. If only the seasonal effect were removed, it would be 0.48 (a 16 percent reduction). If precipitation effects only were removed, it would be 0.47 (an 18 percent reduction from the raw value), and if both precipitation and seasonal effects were removed, it would be 0.39 (a 32 percent reduction). Thus, removal of these two sources of variation prior to testing for trend would be highly beneficial to the trend detection process.

Several methods of trend detection are possible. One ap- proach is to develop a general linear model incorporating seasonal variability, precipitation rate effects, cross and serial correlation, and a (0,l) dummy variable indicating time periods before and after emission controls have been implemented. Such a parametric model can be formulated in numerous ways, but all involve the estimation of a substantial number of model parameters. If the data set is complete (no missing data), then the model parameters may be estimated by the method of maximum likelihood, using a search procedure. The significance of the coefficient of the dummy variable (the step trend term) can be determined by a likelihood ratio test. The optimization may be computationally very burdensome and the results are dependent on many model assumptions (such as additivity and normality) and will not be very resistant to the effects of extreme values. Further- more, the method breaks down when any data are missing.

The alternative test which we epropose here is not as dependent on model assumptions, does not require com- plete data sets, and is resistant to the effects of outliers. The procedure is a modified version of the Mann-Whitney Rank-Sum test on grouped data, applied to the residuals from the regression of In L on In P. By applying it to these resi- duals rather than to In L values, we reduce the stanard devia- tion by 18 percent and hence reduce the size of trend that can be detected with a given power by 18 percent.

The usual form of the Mann-Whitney Rank-Sum test on grouped data is given in Bradley (1968, p. 105). The groups here are defined to be each of the month station combina- tions. By grouping the data according to month as well as station, the power is increased further because this proce- dure removes the effect of between-month variations reducing the size of trend detectable by an additional 14 percent to 32 percent. For the group ik (month i, station k), the Mann- Whitney Rank-Sum statistic

(4)

779

where Rijk is the rank (over all n l + n2 years) of the observed value of .the regression residual ranked within the ith month and kth station and n1 is the number of years from the be- ginning of the record to the assumed time of the step trend. Under the null hypothesis that no trend exists in any group, Wik has expectation

p = n1 - (nl+n2+1)/2 (5)

and variance

where n2 is the number of years of record after the step trend (n=nl+n2). These expressions given here are based on the assumption that all groups have the same record length. These expressions and the ones following could be written in a more general way to accommodate varying record length, but the result would be added complexity of notation.

If the data were independent, then to test for significance a Z statistic

would be computed and compared to a standard normal dis- tribution. However, as the data analysis has shown, the data are not independent and as a consequence the W& are not independent. The covariance between W& and W h can be shown to be related to the rank correlation of the iata.

where p @ijk; W .h) is the rank correlation between data in month i station and that in month g station h and C(Wik, Wgh) is the covariance of these W statistics. Note that when i=g and k=h, C(W&, W$) = u2. The variance of the sum of the Wik'S would thus be

The estimation of these covariances can be simplified by relying on the following two assumptions: that the serial correlation of the ranks is lag one autoregressive and that the same correlation coefficient applies at all of the stations. This estimated correlation coefficient r 1 is estimated as follows: all of the ranks Rijk (except the last one for each station R12 k k=l,2,. . . p s ) are paired with the rank of the succeeding observation, generally Ri+l j , k (except when i=12 it is R1 j+l ,k). The product moment correlation coefficient of all of the pairs is '1.

Based on the two assumptions, the covariances c(Wk,wgk) (different months at the same station) are estimated as

7 ,

WATER RESOURCES BULLETIN

Hirsch and Gilroy

The covariances Cmik,Wih), k f h (different stations for the same month) are estimated as

where ro(k,h) is the product moment correlation coefficient of the concurrent ranks at the two stations (Ri,k,Rijh)i= 1,2, . . J2; j = l , . . .n.

Relying on the lag one autoregressive assumption, then the C(Wik,W h) i f g , k#h (different stations different months) are estimatef as

Finally, the covariance C(Wik,W-k) (same month and sta- tion) are known exactly to be u ? but for convenience of notation we define Ch(Wik,Wik) = u2.

tics is The estimate of the variance of the sum of the Wik statis-

The Z statistic is then computed as in Equation (7) but with the square root of this estimated variance as the denominator.

There is an additional violation of the assumptions of the rank sum test on grouped data which the above procedure does not account for: the assumption that the data are serially independent within each group. The lack of serial independence has the effect of increasing the variance of the Wk statistics. However, the serial correlation within a group is equivalent to the lag 12 month serial correlation in the de- seasonalized monthly data and under any reasonable assump- tion of serial correlation structure (such as AR(l)), given that the lag 1 month correlation is on the order of 0.14, the lag 12 month correlation would be extremely small (on the order of 10-lo). The power function simulations considered in a later section of the paper indicate that the variance com- puted from Equation (9) is a good working approximation, which results in a test for which nominal and actual signifi- cance are approximately equal.

SIMULATIONS

Using the stochastic model described above for generating trend-free traces of deseasonalized monthly log sulfate load- ing rate residuals (€*) at ns sites, we generated a set of traces of a given record length (n years) at ns sites and imposed a step trend. The fact that these records contain no seasonality has no effect on power and significance because the statistics computed are all based on ranks within the month. The trend is assumed to be a constant percentage decrease from the rates occurring before the change. This constant percentage

rate change corresponds to the subtraction of an amount A from each of the trend-free E* values after the time of change. That is, if the trend-free generated series is e*i(t), where i is the index of stations (i=1,2, . . . ,ns) and t is the time in months from the start of the record (t=1,2, . . .12n), then the series with step trend is ei(t) where

The step trend magnitude A can be reexpressed as a per- centage change in the sulfate loading rate. The loading rate after the step change is 100*e-* percent of the rate that would have occurred in the absence of the step change.

For n l fixed at 5 , power curves were generated, relating the probability of detecting a trend ( ~ 0 . 0 5 , one-sided) to the size of A (A=O.O, 0.02(0.02)0.20), for n2 of 1,2,3,4,5,7, 10, and 15 and ns of 1,2,4,8, and 16. One thousand Monte Carlo replicates were generated and the test was run for each value of A, n2, and ns. These one thousand replicates con- sist of 2 0 replicates of each of 50 different randomly generated station configurations. In those cases where A=O, the ob- served power was always close to 0.05, the chosen significance level of the test. In fact, the hypothesis that this test pro- cedure has an actual probability of rejection at a=O.O5 could not be rejected at a significant level of 0.02 in any of the 40 cases considered.

Power curves for n2=2, n2=4, and n2=15 are given in Figures 5 , 6 , and 7, respectively. These power curves show that the gains in power for a given trend magnitude or, con- versely, decrease in detectable trend magnitude for a given power, are fairly large for numbers of stations in the range 1 to 4, but above ns=4 and certainly above ns=8 the gain due to adding stations is very small. In other words, because of the cross correlation between stations, the addition of sta- tions over about 8 produces largely redundant information.

These Monte Carlo results are also displayed in Tables 2 and 3 as power for a given step size as a function of ns and n2. These tables could be used in a network design context. For example, suppose one had the “luxury” of knowing that in five years a major reduction in sulfate emissions was ex- pected to occur and that it was predicted to result in about an 18 percent reduction in deposition (1 8 percent corresponds to A=0.2). Further, suppose that one would like to have at

, least 90 percent power for detecting such a change. There are two objectives one would like to consider in designing a network for this situation, one is minimizing the number of years one would have to wait after the change before t l u s amount of power is attained, the other is to minimize the cost of the network. Using total number of station years as a surrogate for cost, Table 4 summarizes the problem.

Moving through this set of solutions, we see that from ns=l to ns=2 waiting time is reduced by 10 years and the cost in station-years is zero. From ns=2 to ns=4, waiting time is reduced 2 more years at a cost of 12 station-years.

7ao WATER RESOURCES BULLETIN

Detectability of Step Trends in the Rate of Atmospheric Deposition of Sulfate

1 0.000 ! 1 a I I I I I

0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250

STEP CHANGE MAGNITUDE

Figure 5. Power of the Trend Test, nl=5, n2=2 as a Function of Step Trend Size (4 and Number of Stations.

From ns=4 to ns=8, waiting time is reduced by 1 year at a cost of 24 station-years. The change from ns=8 to ns=16 brings no gain at all but a doubling of cost. Based on these tradeoff ratios, one could decide on a maximum number of stations to use. Examining these tradeoffs, ns=4 emerges as a very appealing solution, although if a very high value were placed on knowing soon (or on obtaining subregional informa- tion), then ns=8 would surely be preferred.

Another question, relevant to New York State itself, is that of continuing the operation of the present network in anticipation of a possible change in a few years. Having operated the network for about 15 years (nl), the power for detecting a step trend in 5 years with a network of 8 sta- tions (the number they have operated for most of the 15 years) is 0.76 at a 10 percent trend and 1.0 at an 18 percent trend. These results were developed by reversing n l and n2 and using Tables 2 and 3.

The question also arises of how fast any emission controls would be implemented, and hence how quickly the total intended reduction in emissions could be expected to occur.

781

For instance, controls could be phased in over, say, a two- to four-year period. An assumption of a constant change per month in deposition during this implementation period would seem a reasonable model of the process of change. Then, instead of a step change between two time periods, the changeover period could be represented mathematically by a ramp function. For each month during the implementa- tion period, an average decrease of a ( l 2 ni) could be ex- pected where A is the ultimate total reduction in deposition in percent and ni is the length of the implementation period in years.

In order to use the test proposed herein for such a change- over situation, certain questions must be answered. A small simulation was performed to get a sense of how this situa- tion could be treated. By simply ignoring the data collected during the implementation period and only comparing the data from the time period before the implementation began with data from the time period after total implementation, the same power functions associated with a step change were

WATER RESOURCES BULLETIN

Hirsch and Gilroy

1 .ooo

0.875

0.750

0.625 U W 3 0.500 0 n.

0.375

0.250

0.1 25

0.000 -4 I I t 1 t r

0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250

STEP CHANGE MAGNITUDE

Figure 6 . Power of the Trend Test, nl=5, n2=4 as a Function of the Step Trend Size (A) and Number of Stations.

obtained. This is due to the low serial correlation value in- volved.

If the implementation period data is compared with the post-implementation period when applying the test, the power of the test is less than if the data from the implementation period is ignored. Similarly, splitting the implementation period equally into the before and after periods results in de- creased power as compared to ignoring the implementation period.

From these simulations the tactic of comparing the data before implementation starts with the data after the emission controls have been fully implemented appears to be the most powerful test to use.

SUMMARY

A method was presented to assist policymakers in deter- mining the combinations of number of sampling stations and number of years necessary to state with a given probability

that an atmospheric deposition reduction of a given magni- tude has occurred at a pre-specified time. The method was developed as follows. Monthly sulfate deposition data from five bulk collectors in New York State were analyzed to give relationships between deposition rates and rainfall rates, thus removing the effect of variation in precipitation rates on the deposition rates. The distributional properties of the residuals from this relationship were investigated as functions of the month associated with the residual. A deseasonalized, de- trended multivariate time series of residuals at the five sta- tions was modeled as a lag one multivariate autoregressive process.

A Mann-Whitney Rank-Sum test for grouped data was modified to account for serial and cross correlation in the data. Monte Carlo simulations employing the time series model and the modified test were then used to develop statis- tical power functions relating power (probability of detecting a significant step change) to the magnitude of the change, A, the number of data collection sites, ns, and number of years

782 WATER RESOURCES BULLETIN

Detectability of Step Trends in the Rate of Atmospheric Deposition of Sulfate

1 .ooo

0.875

0.750

0.625

[I: W 3 0.500 0 a

0.375

0.250

0.1 25

0.000 0.000’ 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250

STEP CHANGE MAGNITUDE

Figure 7. Power of the Trend Test nl=5, n2=15 as a Function of the Step Trend Size (A) and Number of Stations.

of data collection before and after the change, n l and n2, respectively.

TABLE 3. Power for a Step Size of -0.2 (an 18 percent decrease) for n l = 5, where n l is the Number of Years Before the Step

Change and n2 is Number of Years After the Step Change.

TABLE 2. Power for a Step Size of -0.1 (a 10 percent decrease), for nl = 5, where n l is the Number of Years Before the Step

Change and n2 is Number of Years After the Step Change.

n2, ns, Number of Stations Number of Years 1 2 4 8 16

n2, ns, Number of Stations Number of Years 1 2 4 8 16

1 2 3 4 5 7 10 15

0.16 0.23 0.26 0.30 0.32 0.38 0.43 0.46

0.23 0.30 0.38 0.4 1 0.47 0.52 0.52 0.58

0.25 0.33 0.3 7 0.45 0.45 0.52 0.52 0.58 0.57 0.64 0.62 0.69 0.66 0.71 0.70 0.76

0.33 0.4 8 0.56 0.61 0.63 0.69 0.77 0.82

1 2 3 4 5 7 10 15

0.39 0.54 0.64 0.71 0.78 0.83 0.88 0.90

0.55 0.72 0.81 0.88 0.91 0.94 0.96 0.97

0.65 0.72 0.83 0.90 0.90 0.96 0.95 0.98 0.97 0.98 0.98 0.99 0.98 1 .oo 0.99 1 .oo

0.76 0.91 0.96 0.98 0.99 1 .oo 1.00 1.00

The power curves for the combinations of A, ns, n l , and n2 considered indicate that increases in station number (ns) above about 8 result in very little increase in power. The

783 WATER RESOURCES BULLETIN

Hirsch and Gilroy

cross correlation between the sites causes much of the addi- tional information to be redundant. Using total number of station-years as a surrogate for cost, a sample table is shown indicating the tradeoff between cost and time necessary to detect a step trend of a given magnitude.

Tanner, R. L. and B. P. Leaderer, 1982. Seasonal Variations in the Composition of Ambient Sulfur-Containing Aerosols in the New York Area. Atmospheric Environment 16:569-580.

Whitehead, H. C. and J. H. Feth, 1964. Chemical Composition of Rain, Dry Fallout, and Bulk Precipitation at Menlo Park, Califor- nia, 1957-1959. Journal of Geophysical Research 69:3319-3333.

TABLE 4. Alternative Network Designs Yielding Power of at Least 90 Percent of Detecting a

Step Trend of Size fk0.2 Given n l = 5.

Required Number of Years of Data

Number of Collection After Total Record Stations Change Length Station Years

(ns) (n2) (nl+n2) ((nl +n2)-ns)

1 15 2 5 4 3 8 2

16 2

20 20 10 20 8 32 7 56 7 112

ACKNOWLEDGMENTS

This research has been funded as a part of the National Acid Pre- cipitation Assessment Program by the U.S. Geological Survey.

REFERENCES

Barnes, C . R., R. A. Schroeder, and N. E. Peters, 1982. Changes in Chemistry of Bulk Precipitation in New York State, 1965-1978. Northeastern Environmental Science 1: 187-197.

Box, G. E. P. and G. M. Jenkins, 1976. Time Series Analysis, Fore- casting and Control. Holden-Day , San Francisco, California, 575 PP.

Box, G. E. P. and G. C. Tiao, 1975. Intervention Analysis with Ap- plications to Economic and Environmental Problems. Journal of the American Statistical Association 70(349): 70-79.

Bradley, J. V., 1968. Distribution-Free Statistical Tests. PrenticeHall, Inc., Englewood Cliffs, New Jersey, 388 pp.

Hirsch, R. M. and E. J. Gilroy, 1984. Methods of Fitting a Straight Line to Data: Examples in Water Resources. Water Resources Bulletin 20(5):705-711.

Kermack, K. A. and J. B. S. Haldane, 1950. Organic Correlation and Allometry. Biometrika 37: 30.

Lioy, P. J. and M. T. Morandi, 1982. Source-Related Winter and Summer Variations in SO2, SO4, and Vanadium in New York City for 1972-1974. Atmospheric Environment 16: 1543-1550.

Matalas, N. C., 1967. Mathematical Assessment of Synthetic Hydro- logy. Water Resources Research, pp. 937-945.

Pearson, F. J. and D. W. Fisher, 1971. Chemical Composition of Atmospheric Precipitation in the Northeastern United States. US. Geological Survey Water-Supply Paper 1535-P, 23 pp.

Peters, N. E., 1984. Quality Analysis of U.S. Geological Survey Pre- cipitation Chemistry Data for New York. Discussion in At- mospheric Environment 18: 1041-1042.

Peters, N. E., R. A. Schroeder, and D. E. Troutman, 1982. Temporal Trends in the Acidity of Precipitation and Surface Waters of New York. U.S. Geological Survey Water-Supply Paper 2188, 34 pp.

Variation in Chemical Wet Deposition with Meteorological Conditions. Atmospheric Environ- ment 16:1647-1656.

Raynor, G. S . and J. V. Hayes, 1982.

784 WATER RESOURCES BULLETIN