some conceptual and statistical issues in analysis of groundwater monitoring data

15
ENVIRONMETRICS, VOL. 7, 185-199 (1996) SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA ROBERT D. GIBBONS Biometric Laboratory, University of Illinois at Chicago, 912 S. Wood, Chicago, IL60612, USA SUMMARY Statistical issues in analysis of groundwater monitoring data are detailed, particularly detection of releases from waste disposal facilities into groundwater flowing beneath a landfill. The probability that new geo- chemical measurements from each of k monitoring wells are consistent with background levels based on a sample of n historical measurements is discussed. Future comparisons (i.e. k monitoring wells each sampled for q chemical constituents) and analytical limitations on data censoring (i.e. samples for which the analyte cannot be quantified) result in complex statistical decisions. Current regulations require all k x q comparisons be within background limits at each semi-annual monitoring event. Parametric and non-parametric solutions to this problem are described. KEY WORDS environmental statistics; groundwater monitoring; prediction limits; multiple comparisons; waste disposal; environmetrics 1. INTRODUCTION Protection of our nation’s natural resources is a major theme in public policy and social thought. New regulations rely on statistical methods to detect the earliest release of pollutants into the air and to reduce and control the impact of industrial discharges on air, surface-water and groundwater resources. Statistical decision rules are often the first consideration in examining the repercussions of environmental pollutants. However, these decision rules have not been accompanied by statistically rigorous methodology. This paper focuses on statistical determination of environmental impact of hazardous and municipal solid waste disposal, specifically, the consequences to groundwater beneath landfills. Recently, USEPA has promulgated new regulations for disposal of hazardous waste (Subtitle C Regulation) (USEPA 1988) and municipal solid waste (Subtitle D Regulation) (USEPA 1989) and associated guidance (USEPA 1988, 1992) which often use statistical decision rules unsuited to the problems encountered in environmental monitoring. In the following sections, general conceptual and statistical features of decision rules are described and statistical approaches are compared and contrasted. 2. GROUNDWATER DETECTION MONITORING Groundwater detection monitoring is used to determine the earliest possible release of pollutants from a waste disposal facility into groundwater underneath the facility. New waste disposal facilities are required to limit leakage by using clay and synthetic liners. However, older facilities were not required to use liners and the likelihood pollutants would be released into groundwater CCC 1 180-4009/96/020185- 15 0 1996 by John Wiley & Sons, Ltd. Received 10 February 1995 Revised 21 June 1995

Upload: robert-d-gibbons

Post on 06-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

ENVIRONMETRICS, VOL. 7, 185-199 (1996)

SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

ROBERT D. GIBBONS Biometric Laboratory, University of Illinois at Chicago, 912 S. Wood, Chicago, IL60612, USA

SUMMARY

Statistical issues in analysis of groundwater monitoring data are detailed, particularly detection of releases from waste disposal facilities into groundwater flowing beneath a landfill. The probability that new geo- chemical measurements from each of k monitoring wells are consistent with background levels based on a sample of n historical measurements is discussed. Future comparisons (i.e. k monitoring wells each sampled for q chemical constituents) and analytical limitations on data censoring (i.e. samples for which the analyte cannot be quantified) result in complex statistical decisions. Current regulations require all k x q comparisons be within background limits at each semi-annual monitoring event. Parametric and non-parametric solutions to this problem are described.

KEY WORDS environmental statistics; groundwater monitoring; prediction limits; multiple comparisons; waste disposal; environmetrics

1. INTRODUCTION

Protection of our nation’s natural resources is a major theme in public policy and social thought. New regulations rely on statistical methods to detect the earliest release of pollutants into the air and to reduce and control the impact of industrial discharges on air, surface-water and groundwater resources. Statistical decision rules are often the first consideration in examining the repercussions of environmental pollutants. However, these decision rules have not been accompanied by statistically rigorous methodology.

This paper focuses on statistical determination of environmental impact of hazardous and municipal solid waste disposal, specifically, the consequences to groundwater beneath landfills. Recently, USEPA has promulgated new regulations for disposal of hazardous waste (Subtitle C Regulation) (USEPA 1988) and municipal solid waste (Subtitle D Regulation) (USEPA 1989) and associated guidance (USEPA 1988, 1992) which often use statistical decision rules unsuited to the problems encountered in environmental monitoring. In the following sections, general conceptual and statistical features of decision rules are described and statistical approaches are compared and contrasted.

2. GROUNDWATER DETECTION MONITORING

Groundwater detection monitoring is used to determine the earliest possible release of pollutants from a waste disposal facility into groundwater underneath the facility. New waste disposal facilities are required to limit leakage by using clay and synthetic liners. However, older facilities were not required to use liners and the likelihood pollutants would be released into groundwater

CCC 1 180-4009/96/020 185- 15 0 1996 by John Wiley & Sons, Ltd.

Received 10 February 1995 Revised 21 June 1995

Page 2: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

186 R. GIBBONS

was high. As landfill waste biodegrades, liquid (termed leachate) which may contain hazardous constituents in varying concentrations, forms at the bottom of the facility. If a linear is absent or leaky the leachate can escape, contaminate groundwater beneath the facility, migrate off-site and negatively affect drinking water supplies, since some leachate constituents are carcinogenic initiators or promoters (Gibbons et al. 1992a; Gibbons 1994a).

Groundwater detection monitoring typically involves a series of monitoring wells hydraulically upgradient and downgradient of the facility to compare concentrations of chemical constituents between the upgradient and downgradient locations (see Figure 1) assuming that any difference in groundwater quality is caused by leachate released from the facility. However, this assumption is often false because widespread spatial variability in groundwater chemistry exists. In the worst case (often the most typical circumstance), regulations (USEPA 1989, 1991) require only one upgradient well and a minimum of three monitoring wells located downgradient from the facility. Using a single upgradient well to characterize natural variability in background confounds spatial variability and contamination (i.e., differences between the upgradient and downgradient wells could be due to natural differences between any two locations regardless of their relation to the waste disposal facility). Even with two upgradient wells, characterization of natural back- ground variability may not be possible, that is, two upgradient wells may not display the same amount of variability observed in downgradient wells, which often number between ten and 100.

Additionally, regulations require each downgradient monitoring well and constituent separately tested because releases from a waste disposal facility into groundwater are 'plume' shaped (see Figure I), which may influence only a single downgradient well. Pooling data over downgradient wells might mask a release that only affected a single well. In addition, chemical constituents travel at different rates in groundwater; the leading edge of the plume may contain only a small number of highly mobile chemical constituents. In many parts of the country, ground- water flows quite slowly, in some cases only a foot or two per year. Hydrogeologically independent observations from a given monitoring well may be available only quarterly, semi-annually or

Ground-water Elevation Contours

Background Well /'

'True" Up-gi

v'- Figure I . General groundwater monitoring of a facility

Page 3: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

ANALYSIS OF GROUNDWATER MONITORING DATA 187

annually. Pooling data may be impractical since it may result in mixing contaminated and uncontaminated measurements, masking early stage release. Therefore, each new datum must be evaluated individually. The two most critical problems are that (a) numerous statistical evaluations must be performed on each monitoring event (typically 100 to IOOO), and (b) environmental data are often censored (i.e. the analyte may or may not be detected when it is present at a level below the capability of the analytical instrument). These two problems complicate analysis of ground- water monitoring data as detailed in the following sections.

3. STATISTICAL PREDICTION INTERVALS

3.1. Single well and constituent

If the problem were to set a (1 - a) 100 per cent confidence limit on the next single measurement for one well and one normally distributed constituent, a P-expectation tolerance limit (i.e. a prediction limit (Guttman 1970; Hahn 1970)) could be computed from n independent back- ground measurements as

x + q-1,1-.pJ(1 + 1/nL (1)

where concern is that the concentration is elevated above background, .f and s are the background sample mean and standard deviation and t is the 100( 1 - a) percentile of Student’s t-distribution on n - 1 degrees of freedom. If upgradient versus downgradient comparisons are to be performed, then a minimum of two upgradient wells should be repeatedly sampled at a time interval sufficient to insure independence (e.g. quarterly or semi-annually). The background time period must include at least one year to ensure that the same seasonal variation present in downgradient wells is reflected in the upgradient background. The reader should note that with multiple upgradient wells, s2, the traditional estimator of u2 is biased (i.e. it is too small) because measurements are nested within upgradient monitoring wells. Alternative estimators for o2 based on variance components models have been proposed and should be used where appropriate (Gibbons 1987, 1994a).

3.2. Multiple wells

In practice, multiple comparisons are performed, one for each downgradient monitoring well and constituent. Using the Bonferroni inequality (Miller 1966), a conservative prediction bound (i.e. the probability of at least one false rejection is at most a) for all kq comparisons (i.e. k wells each tested for q constituents) is

In the present context, the comparisons are dependent because (1) constituents may be correlated, and (2) all downgradient wells are compared to a common background. In this case, the Bonferroni adjustment may be unnecessarily conservative. Some improvement may be gained by adapting the approach of Dunnett and Sobel (1955) originally developed to compare multiple treatment groups to a common control group. The resulting correlation between the multiple comparisons is

where no is the number of background measurements and nj is the number of measurements in

Page 4: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

188 R. GIBBONS

monitoring well j . In the measurement of groundwater the correlation is constant with value p = l /(n + 1). Dunnett (1955) has shown how required values from the multivariate t-distribution can be reduced to evaluation of the equa-correlated multivariate normal distribution for which the required orthant probabilities are easily obtained. These critical points have been tabulated by a number of authors (Gibbons 1994a; Gupta and Panchpakesan 1979). As will be shown in the following section, generalization of the single-stage Dunnett procedure described here to the case of multi-stage sampling using verification resampling results in increased statistical power. Alternative stage-wise comparison procedures have also been considered (Hochberg and Tamhane 1987) in the context of multiple comparisons to a common control.

3.3. Verification resampling

As the number of future comparisons increases, the prediction limit increases and false negative rates can become unacceptably large. Gibbons (1987) and Davis and McNichols (1987) noted this problem and suggested sequential testing of new groundwater monitoring measurements such that the presence of an initial exceedance in a downgradient well requires one or more independent resamples for that constituent. Failure is indicated only if both initial sample and verification resample(s) exceed the prediction limit. In this way, fewer. samples are required and both false positive and false negative rates are controlled at minimum levels. Davis and McNichols (1987) derived simultaneous normal prediction limits for the next r of m measurements at each of k monitoring wells, where in the previous example, r = 1 and m = 2. Their result is a further generalization to Dunnett's test. The derivation is complicated, but a few key features are described. Again assume that the background observations and new monitoring measurements are drawn from the same normal distribution N(p, 02) . Expressing yi, = xi, - X (i.e. a mean deviation) for i = 1, . . . , k wells a n d j = 1,. . . , m samples and letting y j ( r ) denote the rth smallest of the yi j for well i, and y* = maxi(yi(,)), then having at least r of m future observations below X + Ks is equivalent to y* < Ks, where K is the multiplier sought. Davis and McNichols (1987) have shown that

I*-' Pr(y* < Ks) = 1 00 T n - , , f i z * ( h K ) k [ r m( m - 1 )W1(f)q5(f)[1 - @(f)M-r]dt

-m -m r - 1

x m (:I 11) W1(z*)q5(z*)l1 - @(Z*)] ' " -~~Z*, (4)

where Tv,a(.) is the cumulative density function of the non-central t-distribution, z* is the maximum rth order statistic across all k wells, and q5 and @ are the standard normal probability density and cumulative distribution functions. The equation is then solved for K such that the right hand side is equal to 1 - a. Extensive tables of K for varying levels of a, n, k, r and m are available (Gibbons 1994a). For example, with n = 8 background samples, k = 10 monitoring wells and one verification resample (i.e. r = 1, m = 2), the prediction limit

X + 2.03s

will include at least one of the next two measurements in each of 10 downgradient wells with 95 per cent confidence.

3.4. Multiple constituents

Little is known about the correlation between monitoring constituents, except that the

Page 5: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

ANALYSIS OF GROUNDWATER MONITORING DATA 189

interrelationship is highly variable and that there are too few background measurements to precisely characterize the correlation matrix or use the matrix to construct accurate multivariate prediction limits. For this reason, the Bonferroni inequality has been used to derive conservative prediction bands. This practice will produce prediction limits larger than required when positive association is present. For example, in the previous illustration, if we were to monitor 10 constituents, a = 0.05/10 = 0.005 and the limit

X + 3.36s

would be applied to each well and constituent with an overall site-wide confidence level of 95 per cent .

Alternatively, there has been some work on multivariate prediction bounds (Guttman 1970; Bock 1975) which might apply to those cases where background sample sizes were sufficiently large to obtain a reasonable estimate of the inter-constituent covariance matrix. Unfortunately, the presence of non-detects (i.e. left censored distributions) violates the joint normality assump- tion of the multivariate procedure.

3.5. The problem of nondetects

In practice, groundwater measurements consist of a mixture of detected and non-detected constituents ranging in detection frequency from 0 to 100 per cent. When the detection frequency is high (e.g. >85 per cent) several studies (Gibbons 1994a; Gilliom and Helsel 1986; Haas and Scheff 1990) have shown that most estimates of mean and variance of a left censored normal or log-normal distribution yield reasonable results. This is not true when detection frequencies are between 50 and 85 per cent (Gibbons 1994a). In this case, available methods include maximum likelihood estimators (MLE) (Cohen 1959, 1961), restricted maximum likelihood estimators (Persson and Rootzen 1977), an estimator based on the delta distribution which is a log-normal distribution with probability mass at zero (Aitchison 1955), best linear unbiased estimators (Gupta 1952; Sarhan and Greenberg 1962), alternative linear estimators (Gupta 1952), regression type estimators (Gilliom and Helsel 1986; Hashimoto and Trussell 1983) and substitution of expected values of normal order statistics (Gleit 1985). In addition, USEPA has often advocated simple substitution of one-half the method detection limit. Methods that adequately recover the mean and variance of the underlying distribution from the censored data often inadequately recover the tail probabilities used in computing prediction limits (Gibbons 1994a). In a simulation study (Gibbons 1994a) the MLE was the best overall estimator but the estimator based on the delta distribution was best at preserving confidence levels for prediction limits in the presence of censoring.

3.6. Non-parametric prediction limits

When detection frequency is less than 50 per cent, none of the methods discussed in the previous section works well and an alternative strategy must be employed. In practice, an excellent alternative is to compute a non-parametric prediction limit, which is the maximum of n background measurements. The nonparametric limit is attractive because it makes no distribu- tional assumptions and is defined even if only one of the n background measurements is quantifiable. In some cases, however, the number of background measurements is insuffcient to provide a reasonable overall confidence level, therefore the non-parametric prediction limit may not always be an available alternative. Confidence levels for the non-parametric limits are a function of n, kq and the number of verification resamples similar to the parametric case. For example, let X(,,,,) represent the maximum value obtained out of a sample of size n and Y(,,,in,,,q

Page 6: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

190 R. GIBBONS

represent the minimum value out of a sample of size m. In the present context, X(m,,n) is the maximum background concentration and Y(min,m) is the minimum of the initial sample and verification resample(s) for a constituent in a downgradient monitoring well. The objective is to compare Y(min,m) to X(,,,,+, . The confidence level for the simultaneous upper prediction limit defined as X(max,n) is

Pr( Yl(min,m) GX(max,n), Y2(min,m) GX(max,n), . . . i Yk(min,m) X(max,n)) = - a. ( 5 )

To achieve a desired confidence level (say 1 - Q = 0.95 for a fixed number of background measurements), m must be adjusted; the more resamples the greater the confidence. This probability can be evaluated using a variant of the multivariate hypergeometric distribution (Hall et al. 1975; Chou and Owen 1986) function as:

Based on this result, approximate confidence levels for non-parametric prediction limits defined as the maximum of n background samples in which it is required to pass 1 of m samples (i.e. the initial sample or at least one verification resample) at each of k monitoring wells have been derived (Gibbons 1990). To incorporate multiple constituents, the confidence level is adjusted to 1 - a/q. Exact confidence levels for the previous case and approximate confidence levels for the case in which it is required to pass the first or all of m resamples are now also available (Gibbons 1991). Exact confidence levels for this latter case were recently derived (Willits 1993; Davis and McNichols 1994) and extensive tables have been prepared (Gibbons 1994a). The case in which the prediction limit is the second largest measurement has also been considered (Gibbons 1994a; Davis and McNichols 1994).

3.7. Intra-well comparisons

Upgradient versus downgradient comparisons are often inappropriate (e.g. spatial variability may be present) and some form of intra-well comparisons (i.e. each well compared to its own history) must be performed. Note that intra-well comparisons are only appropriate when (1) predisposal data are available or (2) it can be demonstrated that the facility has not affected that well in the past. In this case, there are two good statistical methods available: combined Shewart- CUSUM control charts (Lucas 1982) and intra-well prediction limits (Gibbons 1994a; Davis 1994).The advantage of the combined Shewart-CUSUM control chart is that the method is sensitive to both immediate and gradual releases, whereas prediction limits are only sensitive to absolute increases over background. In the intra-well setting, comparisons are independent since each well is compared to its own history. Gibbons (1994a) (Table 8.3) provides appropriate factors for computing intra-well prediction limits for up to kq = 500 future comparisons under a variety of resampling strategies. These factors apply to normally distributed constituents or constituents that can be suitably transformed to approximate normality. In the non-parametric case, selecting a single future sample and setting the confidence level to 1 - a / ( k q ) is also possible; however, overall confidence levels may be poor due to small numbers of background measurements typically available in individual monitoring wells (i.e. generally eight or fewer). If seasonality is present, adjustments may be required, however, the number of available measure- ments within a given season is typically one per year, therefore most facilities will have insufficient data to estimate the seasonal effect if present.

Page 7: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

ANALYSIS OF GROUNDWATER MONITORING DATA 191

Table I. Eight quarterly TOC measure- ments

Year Quarter TOC in mg/l

92 92 92 92 93 93 93 93

1 10.0 2 11.5 3 11.0 4 10.6 1 10.9 2 12.0 3 11.3 4 10.7

3.8. Nustration

Consider the data in Table I for total organic carbon (TOC) measurements from a single well over two years of quarterly monitoring.

Inspection of the data reveals no obvious trends, and these data have mean x = 11.0 and standard deviation s = 0.61. The upper 95 per cent point of Student’s t distribution on seven degrees of freedom is t[7,1-0.05] = 1.895, therefore the upper 95 per cent confidence normal prediction limit in equation (1) is given by

11.0+ 1495(0.61)J(l + 1/8) = 12.22mg/l,

which is larger than any of the observed values. This limit provides 95 per cent confidence of including the next single observation from a normal distribution for which eight previous measurements have been obtained with observed mean 11 .O mg/l and standard deviation 0.61 mg/l.

Assuming spatial variability does not exist (in many cases a demonstrably false assumption), and that values from this single well are representative of values from each of ten downgradient wells in the absence of contamination, then the corresponding Bonferroni adjusted 95 per cent confidence normal prediction limit in equation (2) for the next 10 new downgradient measure- ments is

11.0 + 3.50(0.61)J(l + 1/8) = 13.26mg/l.

In contrast, if the dependence introduced by comparing all ten downgradient wells to the same background were incorporated as in equation (3), the result of

11.0+ 3.31(0.61)J(l + 1/8) = 13*14mg/l

is obtained (see Table 1.4 in Gibbons (1994a)). Note that the limit is lower because the multiplier incorporates the dependence introduced by repeated comparison to a common background (i.e. the number of independent comparisons is less than 10 given that they are correlated). Although the Bonferroni based limit is too conservative, the increase is reasonably small.

Extending this result to include the effects of a verification resample as in equation (4) further decreases the limit to

11.0 + 2.03(0.61)J(I + 1/8) = 12-31 mg/l.

If each of ten constituents in each of the ten downgradient wells had been monitored, a = 0.05/10 = 0.005 and the limit would become

11.0+ 3.36(0.61)J(l + 1/8) = 13.17mg/l

Page 8: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

192 R. GIBBONS

(see Table 1.5 in Gibbons (1994a)). Note that the verification resample allows application of essentially the same limit derived for ten wells and one constituent (13.14mg/l) to a problem of ten wells and ten constituents ( 1 3.17 mg/l).

Now, consider the non-parametric alternative of taking the maximum of the initial eight background measurements and applying it to the next future monitoring measurement(s). In this example, the non-parametric prediction limit is 12.00 mg/l. For a single future measurement, confidence is 0.88 without a resample and 0-98 with a resample (see Tables 2.5 and 2.6 in Gibbons (1994a)). For a single measurement in each of ten monitoring wells, confidence is 0.44 without a resample and 0.84 with a resample (see Tables 2.5 and 2.6 in Gibbons (1994a)). With ten constituents and ten monitoring wells, an overall 95 per cent confidence level would be obtained with n = 60 background samples for one verification resample (see Table 2.6 in Gibbons (1994a)) or n = 20 samples for passage of one of two verification resamples (see Table 2.7 in Gibbons (1994a)). Note that if either the initial sample or both of two resamples must be passed then n = 90 background measurements must be obtained (see Table 2.13 in Gibbons (1994a)). Other illustrations and further statistical details are available (Gibbons 1994a; Davis 1994; Davis and McNichols 1994a).

4. SOME METHODS TO BE AVOIDED

4.1. Analysis of variance - ANOVA

In both USEPA Subtitle C and D regulations and associated guidance (USEPA 1989, 1992), ANOVA is suggested as the statistical method of choice. Their specific recommendation is a one- way fixed-effect model where the upgradient wells are pooled as one level and each downgradient well represents an additional level in the design. A minimum of four samples is obtained from each well within a semi-annual period. In the presence of a significant F-statistic post hoe comparisons (i.e. Fisher’s LSD method) between each downgradient well and the pooled upgradient background are performed, Either parametric or non-parametric ANOVA models (ie. Kruskal-Wallis test) is acceptable. Unfortunately, application of either parametric or non- parametric ANOVA procedures to ground-water detection monitoring is inadvisable for the following reasons:

1. Univariate ANOVA procedures do not adjust for multiple comparisons due to multiple constituents. This can be devastating to the site-wide false positive rate. As such, a site with ten indicator constituents will have as much as a 40 per cent probability of failing for at least one constituent on every monitoring event by chance alone.

2. ANOVA is more sensitive to spatial variability than to contamination. Spatial variability produces systematic differences between wells that are large relative to within-well variation (i.e. small consistent differences due to spatial variation achieve statistical significance). In contrast, contamination increases variability within the impacted well(s), therefore a much larger between-well difference is required to achieve statistical significance. In fact, application of ANOVA methods to predisposal groundwater monitoring data often results in statistically significant differences between upgradient and downgradient wells, even when no waste is present (Gibbons 1994b), as illustrated in the following example.

3. Non-parametric ANOVA is often presented by USEPA as if it would protect the user from all of the weakness of its parametric counterpart; however, the only assumption relaxed is that of normality. The non-parametric ANOVA still assumes independence, homogeneity of variance and that each measurement is identically distributed.

Page 9: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

ANALYSIS OF GROUNDWATER MONITORING DATA 193

Table 11. Raw data for all detection monitoring wells and con- stituents (mg/l) in a Greenfield site

Well Event TOC TKN COD ALK

MWOl MWOl MWOl MWOl MW02 MW02 MW02 MW02 MW03 MW03 MW03 MW03 MW04 MW04 MW04 M W04 MW05 MW05 MWO5 MW05 MW06 MW06 MW06 MW06 P14 P14 P14

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 I 2 3 4 1 2 3 4 1 2 3

5.2000 6.8500 4.1500

15.1500 1.6000 6.2500 1,4500 1 .oooo 1 .oooo 1.9500 1.5000 4.8000 4.1500 1 .oooo 1.9500 1.2500 2.1500 1 .oooo

196000 1 .oooo 1.4000 1 .oooo 1.5000

20.5500 2.0500 1.0500 5.1000

0.8000 44.0000 0.9000 13.0000 0.5000 139000 0.5000 40.0000 1.6000 11~OOOO 0.3000 10.0000 0.7000 10.0000 0.2000 13.0000 1.8000 28.0000 0.4000 10.0000 0.3000 11.0000 0.5000 26.0000 1.5000 419000 0.3000 10.0000 0.3000 24.0000 0.4000 454000 0.6000 399000 0.4000 264000 0.3000 31.0000 0.2000 48.0000 0.8000 22.0000 0,2000 23.0000 05000 25.0000 0.4000 28.0000 0.2000 10~0000 0.3000 10~0000 0.5000 10~0000

58.0000 49.0000 40.0000 42.0000 59.0000 82.0000 54.0000 5 1 .0000 39.0000 70.0000 42.0000 42.0000 54.0000 40.0000 32.0000 28.0000 5 1 .OOOO 55.0000 60.0000 52.0000

1 18.0000 66-0000 59.0000 63.0000 7 9.0 0 0 0 96.0000 89.0000

TOC = total organic carbon TKN = total kjeldahl nitrogen COD = chemical oxygen demand ALK = alkalinity

4. ANOVA requires pooling of downgradient data. Specifically, USEPA suggests that four samples per semi-annual monitoring event be collected (i.e. eight samples per year). However, ANOVA cannot rapidly detect a release since only a subset of the required four semi-annual samples will initially be affected by a site impact. This heterogeneity will decrease the mean concentration and increase the variance for the affected well limiting the ability of the statistical test to detect actual contamination.

To illustrate, consider the data in Table I1 obtained from a facility in which disposal of waste has not yet taken place (Gibbons 1994a). Applying both parametric and non-parametric ANOVA to these predisposal data yielded an effect that approached significance for chemical oxygen demand (COD) ( p < 0.072) parametric and p < 0.066 non-parametric) and a significant difference for alkalinity (ALK) (p < 0.002 parametric andp < 0.009 non-parametric). Individually compared (using Fisher’s LSD), significantly increased COD levels were found for well MW05 ( p < 0.026) and significantly increased ALK was found for wells MW06 (p < 0.026) and PI4 ( p < 0.003) relative to upgradient wells. These results represent false positives due to spatial variability since no garbage has been deposited at this site (i.e. a ‘greenfield’ site). Most remarkable is the absence of significant results for TOC, notwithstanding some values are as much as 20 times higher than others. These extreme values increase the within-well variance

Page 10: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

194 R. GIBBONS

Table 111. Illustration of pH data used in computing the CABF t-test

Replicate

Date 1 2 3 4 Average

Upgradient 11/81 7.77 2/82 7.74 5/82 7.40 8/82 7.50

XB SDB NB

9/83 7.39 ZB SDB NB

Downgradient

7.76 7.78 7.80 7.82 7.40 7.40 7.50 7.50

7.62 0.18

16

7.40 7.38 7.40 0.02 4

7.78 7.77 7.85 7.80 7.40 7.40 7.50 7.50

7.62 0.20 4

7.42 7.40 7.40

1

estimate rendering the ANOVA powerless to detect differences regardless of magnitude. Elevated TOC data are inconsistent with chance expectations (based on analysis using prediction limits) and should be investigated. In this case, elevated TOC data are likely to have been caused by contamination from insects getting into the wells, since this greenfield facility is located in the middle of the Mojave desert.

4.2. Cochran’s approximation to the Behrens Fisher t-test

For years the USEPA Resource Conservation and Recovery Act (RCRA) regulation (USEPA 1982) was based on application of the Cochran’s approximation to the Behrens Fisher (CABF) t-test. The test was incorrectly implemented by requiring that four quarterly upgradient samples from a single well and single samples from a minimum of three downgradient wells each be divided into four aliquots and treated as if there were 4n independent measurements. The result was that most hazardous waste disposal facilities regulated under RCRA were declared ‘leaking’. As an illustration consider the data in Table 111.

Note that the aliquots are almost perfectly correlated and add virtually no independent information yet they are assumed by the statistic to be completely independent. The CABF t-test is computed as

The associated probability of this test statistic is 1 in 10,000, indicating that the chance that the new monitoring measurement came from the same population as the background measurements is remote. Note that, in fact, the mean concentration of the four aliquots for the new monitoring measurement is identical to one of the four mean values for background, suggesting intuitively that probability is closer to one in four rather than one in 10,000. Averaging the aliquots yields the statistic

7.62 - 7.40 0.22 = 1.0 - - -

X B - X M = sBd(& + 1) 0 . 2 0 4 + 1) - 0.22

Page 11: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

ANALYSIS OF GROUNDWATER MONITORING DATA 195

Table IV. Detection frequencies for various censoring points for volatile organic compounds ,ug/l

Detection limits

Compound N >O.O 0.5 1.0 1.5 2.0 3.0 4.0 5.0

1,1,l-trichloroethane 1,1,2,2-tetrachloroethane 1,l ,Ztrichloroethane 1,l -dichloroethane 1,l -dichloroethane 1 ,Zdichloroethane 1,2-dichloropropane 2-butanone 2-chloroethylvinyl ether 2- hexanone 4-methyl-2-pentanone Acetone Benzene Bromodichloromethane Bromoform Bromomethane Carbon disulphide Carbon tetrachloride Chlorobenzene Chloroethane Chloroform Chloromet hane Cis- 1,3-dichloropropene Dibromochloromethane Dichlorodifluoromethane Ethylbenzene Methylene chloride Styrene Tetrachloroethene Toluene Trans- 1,2-dichloroethene Trans- 1,3-dichloropropene Trichloroethene Trichlorofluoromethane Vinyl chloride

~~

687 10 1 1 0 0 0 0 0 614 3 0 0 0 0 0 0 0 632 2 0 0 0 0 0 0 0 69 1 3 0 0 0 0 0 0 0 636 0 0 0 0 0 0 0 0 673 22 0 0 0 0 0 0 0 595 16 3 3 2 2 2 1 1 218 13 10 6 4 4 2 1 1 502 3 1 0 0 0 0 0 0 102 0 0 0 0 0 0 0 0 166 8 5 1 0 0 0 0 0 136 35 32 31 28 28 20 12 8 700 68 4 1 0 0 0 0 0 564 5 0 0 0 0 0 0 0 61 1 2 0 0 0 0 0 0 0 610 9 0 0 0 0 0 0 0 126 6 0 0 0 0 0 0 0 623 13 0 0 0 0 0 0 0 646 10 1 1 1 1 1 1 1 64 1 3 1 0 0 0 0 0 0 580 83 30 8 3 3 1 1 1 613 44 7 5 1 1 0 0 0 583 3 0 0 0 0 0 0 0 596 0 0 0 0 0 0 0 0 20 1 3 1 1 1 1 0 0 0 702 73 3 0 0 0 0 0 0 688 180 52 21 11 11 2 0 0 100 0 0 0 0 0 0 0 0 632 3 0 0 0 0 0 0 0 698 172 47 11 4 4 2 2 0 646 0 0 0 0 0 0 0 0 579 2 0 0 0 0 0 0 0 687 8 1 0 0 0 0 0 0 525 12 2 1 0 0 0 0 0 654 2 0 0 0 0 0 0 0

which has an associated probability of one in two. Had the sample size been increased to NB = 20 the probability would have decreased to one in three. USEPA eliminated this method from the regulation (USEPA 1988) in 1988.

5. ANTHROPOGENIC COMPOUNDS

Anthropogenic compounds are those created solely by man and therefore should not be present in background. As such these compounds (e.g. volatile organic compounds - VOCs) are often regulated on presence or absence of VOCs or quantifiability of VOCs. Statistical limits applied to VOCs are typically method detection limits (MDLs) (Gibbons 1994a; Currie 1968; Hubaux and Vos 1970; Gibbons et al. 1991; Lambert et al. 1991) or practical quantification limits (PQLs) (Currie 1968; Gibbons et al. 1992b). MDLs determine whether or not the compound is present in

Page 12: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

196 R. GIBBONS

a sample with a specified level of confidence and PQLs provide a point estimate of relative precision (e.g. the concentration at which the relative standard deviation (RSD) is 10 per cent). MDLs and PQLs are both laboratory derived statistics.

Table IV presents a series of VOCs measured in approximately 700 field blanks by a single laboratory designed for this purpose. Field blanks are vials of distilled water that travel to the site with the sampling team, are opened at the site, immediately closed and then sent to the laboratory for analysis. Field blanks do not contain VOCs. Inspection of Table IV, however, reveals a considerable proportion of field blank samples in which VOCs were detected and in which a quantitative determination was made by the analytical instrument (Gas Chromatographic Mass Spectrometer - GC/MS) when the limit of detection was set in the range of 0.0 to 0.5 pg/l, a level at which many state and federal agencies believe will accurately detect these compounds. As such, many facilities may be incorrectly shown to have an impact on groundwater on the basis of low level detections of VOCs.

The problem of low level detections is due in large part to the statistical method that USEPA uses to compute the MDL. USEPA adopted a method originally derived by Glaser (Glaser et al. 198 1 ) which uses the estimator

MDL = t ( 6 , 0 . 9 9 ) ~ = 3.14s (7) where s is the standard deviation of seven replicate samples spiked at a fixed concentration in the range of two to five times the hypothesized MDL. Beyond the obvious weaknesses of (a) having the same analyst both prepare and analyse the samples with known concentrations, (b) ignoring uncertainty in the calibration function that relates instrument response to concentration, (c) assuming that the background signal is zero, and (d) the lack of consistency or guidance involved in selecting the concentration at which the samples are spiked (Gibbons et al. 1991; Clayton et al. 1987), the most serious problem is that at low levels, variability in measured concentration is proportional to spiking concentration; hence the MDL will be proportional to the concentration at which the samples are spiked. Similarly, the PQL (as defined by USEPA) suffers from the same

50 1 ,

0 10 20 30 40 50

ACTUAL BENZENE CONCENTRATION

Figure 2. Linear calibration data for benzene in pg/l with 99 per cent confidence WLS prediction bands

Page 13: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

ANALYSIS OF GROUNDWATER MONITORING DATA 197

Table V. USEPA limits of detection at various spiking concentrations of benzene in distilled water using GC/MS

Concentration SD MDL

40 30 20 10 4 1 0 5 0.1 0.01

1.01 3.18 0.88 2.15 0.72 2.24 0.51 1.59 0.32 1 .oo 0.16 0.50 0 1 1 036 0.05 0.16 0.02 0.05

weaknesses since for no particular scientific or statistical reason USEPA computes it as a simple multiple of the MDL (i.e. 5 to 10 times the MDL).

As an illustration, Figure 2 displays the standard calibration curve for benzene in the range of 0 to 40 pg/l. The interval around the calibration line is a 99 per cent confidence prediction limit for the next single deviation from the calibration line (conditional on spiking concentration x) based on a weighted least squares (WLS) solution for s. The WLS estimators used here are:

and

which assume that variability is proportional to concentration. The WLS prediction bands appear to fit the observed measurements extremely well. Similar results were observed for all other VOCs listed in Table IV. Table V displays MDL estimates at various concentrations using USEPA’s method.

Table V reveals that the USEPA MDL estimator will yield values ranging from 3.18 pg/l to O.OSpg/l depending solely on the concentration at which the samples were spiked. If a new analytical method were developed with lower limits of detection, it might be tested by spiking at a correspondingly lower concentration. Regardless of the true analytical properties of the method a lower limit of detection could be found. This is unacceptable in a detection limit estimator and can lead to both false positive and false negative regulatory decisions based on sample spiking at higher or lower concentrations. Detailed discussion of statistically rigorous alternatives to this decision-making process is beyond the scope of this paper, but an emerging literature exists (Gibbons 1994a, 1995; Hubaux and Vos 1970; Gibbons et al. 1991; Lambert et al. 1991; Gibbons et al. 1992b; Clayton et al. 1987).

6. SUMMARY

Protection of our natural resources is critical; however, statistical tools used to make environ- mental impact decisions are limited and often confusing. The problem is not only interesting

Page 14: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

198 R. GIBBONS

regarding development of public policy, but it also contains features of statistical interest such as multiple comparisons, sequential testing and censored distributions. Highlighting the weaknesses of currently mandated regulations may lead to further critical examination of public policy in the field of groundwater monitoring as well as heightened interest in statistical analysis.

REFERENCES

Aitchison, J. (1955). ‘On the distribution of a positive random variable having a discrete probability mass at

Bock, R. D. (1975). Multivariate Statistical Methods in Behavioral Research, McGraw Hill, New York. Chou, Y. M. and Owen, D. B. (1986). ‘One-sided distribution-free simultaneous prediction limits for

Clayton, C. A., Hines, J. W. and Elkins, P. D. (1987). ‘Detection limits with specified assurance

Cohen, A. C. (1959). ‘Simplified estimators for the normal distribution when samples are singly censored or

Cohen, A. C. (1961). ‘Tables for maximum likelihood estimates: Singly truncated and singly censored

Currie, L. A. (1968). ‘Limits for qualitative detection and quantitative determination: Application to

Davis, C. B. (1994). ‘Environmental regulatory statistics’, in Patil, G. P. and Rao, C. R. (eds), Handbook of

Davis, C. B. and McNichols, R. J. (1987). ‘One-sided intervals for at least p of m observations from a normal

Davis, C. B. and McNichols, R. J. (1994a). ‘Ground-water monitoring statistics up-date: I: Progress since

Davis, C. B. and McNichols, R. J. (1994b). ‘Ground-water monitoring statistics up-date: 11: Nonparametric

Dunnett, C. W. (1955). ‘A multiple comparisons procedure for comparing several treatments with a

Dunnett, C. W. and Sobel, M. (1955). ‘Approximations to the probability integral and certain percentage

Gibbons, R. D. (1987). ‘Statistical prediction intervals for the evaluation of ground-water quality’, Ground

Gibbons, R. D. (1990). ‘A general statistical procedure for ground-water detection monitoring at waste

Gibbons, R. D. (1991). Some additional nonparametric prediction limits for ground-water detection

Gibbons, R. D. (1994a). Statistical Methods for Groundwater Monitoring, Wiley, New York. Gibbons, R. D. (1994b). ‘The folly of Subtitle D statistics: when greenfield sites fail’, Proceedings of Waste

Technology, 94, National Solid Waste Management Association, Washington DC, January 13-1 5. Gibbons, R. D. (1995). ‘Some statistical and conceptual issues in the detection of low-level environmental

pollutants’, Environmental and Ecological Statistics, 2, 1-43. Gibbons, R. D., Jarke, F. H. and Stoub, K. P. (1991). ‘Detection limits for linear calibration curves with

increasing variance and multiple future detection decisions’, in Friedman, D. (ed), Waste Testing and Quality Assurance, ASTM STP 1075, American Society for Testing and Materials, Philadelphia, PA, pp.

Gibbons, R. D., Dolan, D., Keough, H., O’Leary, K. and O’Hara, R. (1992a). ‘A comparison of chemical constituents in leachate from industrial hazardous waste and municipal solid waste landfills’, Proceedings of the Fifteenth Annual Madison Waste Conference, 23-24 September, University of Wisconsin, Madison.

Gibbons, R. D., Grams, N. E., Jarke, F. H. and Stoub, K. P. (1992b). ‘Practical quantitation limits’, Chemometrics and Intelligent Laboratory Systems, 12, 225-235.

Gilliom, R. J. and Helsel, D. R. (1986). ‘Estimation of distributional parameters for censored trace level water quality data: 1. Estimation techniques’, Water Resources Research, 22, 135-146.

Glaser, J. A., Foerst, D. L., McKee, G. D., Quane, S. A. and Budde, W. L. (1981). ‘Trace analyses for wastewaters’, Environmental Science and Technology, 15, 1426- 1435.

the origin’, Journal of the American Statistical Association, 50, 901 -908.

p future samples’, Journal of Quality Technology, 18, 96-98.

es’, Analytic Chemistry, 59, 2506-2514.

truncated’, Technometrics, 1, 217-237.

samples’, Technometrics, 3, 123-128.

radiochemistry’, Analytical Chemistry, 40, 586-593.

Statistics: Environmental Statistics, Elsevier, New York, Chapter 25, pp. 817-865.

population on each of r future occasions’, Technometrics, 29, 359-370.

1988’, Ground Water Monitoring and Remediation, 14, 148-1 58.

prediction limits’, Ground Water Monitoring and Remediation, 14, 159-169.

control’, Journal of the American Statistical Association, 50, 1096-1 121.

points of a multivariate analogue of Student’s t-distribution’, Biometrika, 42, 258-260.

Water, 25, 455-465.

disposal facilities’, Ground Water, 28, 235-243.

monitoring at waste disposal facilities’, Ground Water, 29, 729-736.

377-390.

Page 15: SOME CONCEPTUAL AND STATISTICAL ISSUES IN ANALYSIS OF GROUNDWATER MONITORING DATA

ANALYSIS OF GROUNDWATER MONITORING DATA 199

Gleit, A. (1985). ‘Estimation for small normal data sets with detection limits’, Environmental Science and

Gupta, A. K. (1952). ‘Estimation of the mean and standard deviation of a normal population from a

Gupta, S. S . and Panchpakesan, (1979). Multiple Decision Procedures, Wiley, New York. Guttman, I . (1970). Statistical Tolerance Regions: Classical and Bayesian, Hafner, Darien Conn, 1970. Haas, C. N. and Scheff, P. A. (1990). ‘Estimation of averages in truncated samples’, Environmental Science

Techniques, 24,912-919. Hahn, G. J. (1970). ‘Additional factors for calculating prediction intervals for samples from a normal

distribution’, Journal of the American Statistical Associalion, 65, 1668- 1676. Hall, I. J., Prarie, R. R. and Motlagh, C. K. (1975). “on-parametric prediction intervals’, Journalof Quality

Technology, 7, 109-1 14. Hashimoto, L. K. and Trussell, R. R. (1983). ‘Evaluating water quality data near the detection limit’, paper

presented at the Proceedings of the American Water Works Assoc. Advanced Technology Conference, American Water Works Assoc., Las Vegas, Nev., June 5-9.

Technology, 19, 1201-1206.

censored sample’, Biometrika, 39, 260-273.

Hochberg, Y. and Tamhane, A. C. (1987). Multiple Comparison Procedures, Wiley, New York. Hubaux, A. and Vos, G . (1970). ‘Decision and detection limits for linear calibration curves’, Analytical

Lambert, D., Peterson, B. and Terpenning, I. (1991). ‘Nondetects, detection limits and the probability of

Lucas, J. M. (1982). ‘Combined Shewart-CUSUM quality control schemes’, Journal of Quality Technology,

Miller, R. G. (1966). Simultaneous Statistical Inference, McGraw Hill, New York. Persson, T. and Rootzen, H. (1977). ‘Simple and highly efficient estimators for a type I censored normal

Sarhan, A. E. and Greenberg, B. G. (eds) (1962). Contributions to Order Statistics, Wiley, New York. USEPA (1982). ‘Hazardous waste management system; permitting requirements for land disposal facilities’,

USEPA (1988). ‘40CFR Part 264: Statistical methods for evaluating ground-water monitoring data from

USEPA (1989). ‘Statistical analysis of ground-water monitoring data of RCRA facilities - Interim Final

USEPA (1991). ‘Solid waste disposal facility criteria: Final rule’, Federal Register, 56, (196), 50978-51 119

USEPA (1992). ‘Statistical analysis of ground-water monitoring data at RCRA facilities. Addendum to

Willits, N. (1993). Personal Communication, University of California at Davis.

Chemistry, 42, 849-855.

detection’, Journal of the American Statistical Association, 86, 266-277.

14, 51-59.

sample’, Biometrika, 64, 123-128.

Federal Register, 47( 143), 32274-32373 (July 26).

hazardous waste facilities: Final rule’, Federal Register, 53, (196), 39720-3973 1 (October 11).

Guidance’, April 1989.

(October 9).

Interim Final Guidance’, Office of Solid Waste, July.