analysis of variance in analytical chemistry

7
Roland F. Hirsch Chemistry Department Seton Hall University South Orange, N.J. 07079 Report Analysis of Variance in Analytical Chemistry Analytical chemists usually work with techniques and methods which contain many sources of error. If the total variability in a particular case is higher than desired, then the signifi- cant sources of error must be identi- fied and controlled. Analysis of vari- ance is a statistical technique for esti- mating the importance of one or more factors suspected of contributing sig- nificantly to the total uncertainty in a given situation. Although analysis of variance (anova) is a well-established technique [nearly half of Youden's 1951 text (1) is devoted to it], it seems to be used quite rarely by chemists. This article is intended to encourage more regular application of anova by showing how it works using specific cases in analyti- cal chemistry. The actual computa- tions will not be described since they are presented in many statistics texts, several of which are listed at the end of this article. Rather, the emphasis will be on choice of the proper model for the situation, interpretation of the anova, and the advantages and limita- tions of the technique. In anova the primary purpose is to test the hypothesis that a factor does not contribute added variability to a set of data beyond that caused by all other factors (the residual error). If this hypothesis is found invalid, then the size of the contribution from this factor can be estimated and appropri- ate steps can be taken in succeeding experiments to keep it under control. The usual procedure for anova in- volves identifying the factor(s) to be studied, designing and carrying out experiments in which data are collect- ed for at least two levels of each factor, apportioning the variance of the entire set of the data among the sources of variation, and interpreting the results of these computations. Single-Factor Analysis of Variance How anova works can best be un- derstood through an example. In the development of a rapid, fully auto- mated atomic absorption analysis sys- tem using the Delves Cup technique of sample introduction (2), it was real- ized that the potentially high preci- sion of automated sampling and data acquisition could not be attained if one could not count on obtaining re- producible results from one sample cup to the next. A test was designed to estimate the contribution of be- tween-cup variability to the overall precision. Ten cups were selected at random, and aliquots of a standard so- lution of lead were run until nine rep- lications had been made with each cup. The signal observed consisted of an absorption peak caused by the lead in the sample being volatilized into the light beam. The raw data and anova table are shown in Table I. Each of the items in the anova table will now be described. The sources of variation are the specific factor or fac- tors being studied (here the cups) and all other sources, pooled and called the residual (or replication or mea- surement) error. The degrees of free- dom {df) are, as customary, one less than the number of groups, here 10 1 = 9, and the sum of (one less than the number of data within each group), here 8 X 10 = 80. The total sum of squares (SS) is the total of the squared deviations of the individual values from the grand mean, ( Y :j V) 2 . The SSamong cups compares the mean value foi^each^cup with the grand mean, (Y } V)-, and SS r< , s id U ai compares the individual readings for a cup with the mean value for that cup, (Yij — Yj) 2 . The mean squares (MS) are the ratios of sums of squares to degrees of freedom. The experimen- tal F ratio (F s ) is the MS amun g cups/ MS reB iduai· Finally, the expected values of the mean squares \E(MS)\ are given in the last column, with a- Table 1. Automated Delves Cup Determination of Lead By Use of Peak Areas (2) Cup no. 1 2 3 4 5 6 7 8 9 10 3.104 3.126 3.084 3.060 3.196 3.120 2.886 2.982 3.252 3.099 3.055 2.823 2.953 2.983 2.785 3.077 2.794 3.110 2.937 3.016 2.908 2.758 2.896 2.940 2.902 2.926 2.719 2.933 2.933 3.020 3.053 2.809 2.811 2.782 2.958 2.944 2.677 2.909 2.944 2.972 2.893 2.667 2.915 2.844 2.935 3.031 2.752 2.984 2.781 3.036 2.864 2.888 2.896 3.010 2.825 2.899 2.889 2.943 3.073 2.880 2.919 2.831 2.823 2.857 2.863 2.928 3.034 2.736 2.944 3.013 2.760 2.843 2.919 2.974 2.893 2.770 2.846 2.836 2.859 2.901 2.992 3.019 3.031 2.952 2.890 2.880 2.850 2.855 2.975 2.971 Anova table Source of variation df SS MS Fs E(/WS) Among cups 9 0.192227 0.0213585 1.827 ns <T 2 + 9a c 2 Residual error 80 0.935060 0.0116882 a 2 Total 89 1.127287 ANALYTICAL CHEMISTRY, VOL. 49, NO. 8, JULY 1977 · 691 A

Upload: roland-f

Post on 21-Feb-2017

261 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Analysis of Variance in Analytical Chemistry

Roland F. Hirsch Chemistry Department Seton Hall University South Orange, N.J. 07079

Report

Analysis of Variance in Analytical Chemistry

Analytical chemists usually work with techniques and methods which contain many sources of error. If the total variability in a particular case is higher than desired, then the signifi­cant sources of error must be identi­fied and controlled. Analysis of vari­ance is a statistical technique for esti­mating the importance of one or more factors suspected of contributing sig­nificantly to the total uncertainty in a given situation.

Although analysis of variance (anova) is a well-established technique [nearly half of Youden's 1951 text (1) is devoted to it] , it seems to be used quite rarely by chemists. This article is intended to encourage more regular application of anova by showing how it works using specific cases in analyti­cal chemistry. T h e actual computa­tions will not be described since they are presented in many statistics texts, several of which are listed at the end of this article. Rather , the emphasis will be on choice of the proper model for the situation, interpretation of the anova, and the advantages and limita­tions of the technique.

In anova the primary purpose is to test the hypothesis tha t a factor does not contr ibute added variability to a set of data beyond tha t caused by all other factors (the residual error). If this hypothesis is found invalid, then the size of the contribution from this factor can be est imated and appropri­ate steps can be taken in succeeding experiments to keep it under control.

The usual procedure for anova in­volves identifying the factor(s) to be studied, designing and carrying out experiments in which data are collect­ed for at least two levels of each factor, apportioning the variance of the entire set of the da ta among the sources of variation, and interpreting the results of these computations.

Single-Factor Analysis of Variance How anova works can best be un­

derstood through an example. In the development of a rapid, fully auto­mated atomic absorption analysis sys­

tem using the Delves Cup technique of sample introduction (2), it was real­ized t ha t the potentially high preci­sion of automated sampling and data acquisition could not be at tained if one could not count on obtaining re­producible results from one sample cup to the next. A test was designed to estimate the contribution of be-tween-cup variability to the overall precision. Ten cups were selected at random, and aliquots of a s tandard so­lution of lead were run until nine rep­lications had been made with each cup. T h e signal observed consisted of an absorption peak caused by the lead in the sample being volatilized into the light beam. The raw data and anova table are shown in Table I.

Each of the items in the anova table will now be described. The sources of variation are the specific factor or fac­tors being studied (here the cups) and

all other sources, pooled and called the residual (or replication or mea­surement) error. The degrees of free­dom {df) are, as customary, one less than the number of groups, here 10 — 1 = 9, and the sum of (one less than the number of data within each group), here 8 X 10 = 80. T h e total sum of squares (SS) is the total of the squared deviations of the individual values from the grand mean, ( Y:j — V)2. The SSamong cups compares the mean value foi^each^cup with the grand mean, (Y} — V)-, and SSr<,sidUai compares the individual readings for a cup with the mean value for that cup, (Yij — Yj)2. The mean squares (MS) are the ratios of sums of squares to degrees of freedom. The experimen­tal F ratio (Fs) is the MS a m u n g cups/ MSreBiduai· Finally, the expected values of the mean squares \E(MS)\ are given in the last column, with a-

Table 1. Automated Delves Cup Determination of Lead By Use of Peak Areas ( 2 )

Cup no.

1 2 3 4 5 6 7 8 9 10

3.104 3.126 3.084 3.060 3.196 3.120 2.886 2.982 3.252 3.099

3.055 2.823 2.953 2.983 2.785 3.077 2.794 3.110 2.937 3.016

2.908 2.758 2.896 2.940 2.902 2.926 2.719 2.933 2.933 3.020

3.053 2.809 2.811 2.782 2.958 2.944 2.677 2.909 2.944 2.972

2.893 2.667 2.915 2.844 2.935 3.031 2.752 2.984 2.781 3.036

2.864 2.888 2.896 3.010 2.825 2.899 2.889 2.943 3.073 2.880

2.919 2.831 2.823 2.857 2.863 2.928 3.034 2.736 2.944 3.013

2.760 2.843 2.919 2.974 2.893 2.770 2.846 2.836 2.859 2.901

2.992 3.019 3.031 2.952 2.890 2.880 2.850 2.855 2.975 2.971

A n o v a tab le

Source of variation df SS MS Fs E(/WS)

Among cups 9 0.192227 0.0213585 1.827 ns <T2 + 9ac2

Residual error 80 0.935060 0.0116882 a2

Total 89 1.127287

ANALYTICAL CHEMISTRY, VOL. 49, NO. 8, JULY 1977 · 691 A

Page 2: Analysis of Variance in Analytical Chemistry

representing in this example the vari­ance of replication and ac

2 the vari­ance among cups.

The Fs value is compared with criti­cal values of F. If Fs is greater than a tabulated F, then the hypothesis of no added variance due to the specific factor [H0: ac

2 = 0] is rejected with the designated confidence level. By convention, if Fs < Fo.os (95% confi­dence), then the ratio is considered not significant, and the hypothesis is accepted. Fs is labeled "ns" in this case. If F0.05 ^ Fs < F0.o\, a single as­terisk is placed next to the value of Fs, while F0.01 ^ Fs < F0.oo\ calls for two asterisks, and ivooi < Fs calls for three asterisks, as a shorthand for de­noting the significance of the Fs value.

In the Delves Cup analysis example, the relevant critical value of F is ^o.05[9,80) = 1-99. In this case, Fs is not significant; therefore, one can say that the variation among cups is not an im­portant source of error.

Whether Fs is significant or not, it is possible to use the MSWjth;n as an estimate, s2 [Roman letters represent estimates of the parameters symbol­ized by the corresponding Greek let­ters (e.g., s is an estimate of σ)], of the residual error, σ2. The value of s2 is 0.01169, s is 0.108, and the relative standard deviation is 3.7%. No sc is calculated since the MS a m o n g cups is not significantly greater than that ex­pected by chance in the absence of a contribution from the cups; ac unden­iably is finite, but it cannot be detect­ed in the present case because of its small size.

Random and Nonrandom Factors The initial example illustrates the

single factor anova and will serve as the prototype for the computationally more complicated multifactor anovas. Before considering situations where more than just one factor is singled out for attention, however,· we need to distinguish between two kinds of

factors. One is the random error fac­tor, such as in the example just de­scribed. The other is the treatment ef­fect factor, where the values of the factor are not selected at random, but rather where the choice is controlled by the experimenter.

Suppose, for example, that we were interested in whether the material from which the cups were made was important. Instead of choosing a few cups at random from an almost infi­nite number, we would select from the limited number of materials suitable for this technique. If these cups differ significantly, it will be a determinate— not a random—source of error, as the differences which depend on the com­position of the cups will be consistent from one experiment to another. In contrast, the fact that cup 9 usually gives a higher result than cup 7 in the experiment shown in Table I tells us nothing about their relationship in a later experiment with a new set of ran­domly selected cups.

In the random effects case, often called Model II or infinite population, the individual values Yi; = μ + iy- + Αι, where μ is the mean value for the population, ey- is the residual error (standard deviation σ) contribution for this measurement j on cup i, and Ai is the among-cups random error (standard deviation oc) contri. ution for cup i.

In the treatment effects case, often called Model I or finite population, the individual readings Y,y = μ + ey + a,·, where a; is the effect of applying treatment i (such as a particular com­position of cup). There is no point in calculating a standard deviation here, since it is not an estimate of a parame­ter ac, but rather the result of the choice of materials for this experi­ment. Another experiment with a dif­ferent group of materials would yield a different "standard deviation" at this level; in the random model, how­ever, a second group of cups would provide another estimate of the same

quantity, cc. In the Model I anova, a significant Fs suggests instead that one test the mean values of each group (cup compositions determine the groupings) to see which are signifi­cantly different from the rest; an ex­ample will be given shortly.

Two-Factor Analysis of Variance When the contributions to variabili­

ty from two factors are to be tested si­multaneously, it is necessary to deter­mine the relationship of the second factor to the first. There are two possi­bilities: an heirarchical (commonly called nested) arrangement or an inde­pendent arrangement of the variables.

Nested Factors. Suppose a ran­domly distributed factor contributes different levels for each group or treatment value of the second factor— the values of the first factor in one group bear no relationship to the values in any other group. The first factor is nested within the second fac­tor.

Consider the following example (3). Several samples of atmospheric par­ticulate matter were collected on filter paper by use of the high-volume tech­nique. The samples were analyzed by x-ray emission spectrometry for sever­al trace metals. Normally, a single small portion (about 5 cm2) of each 500-cm2 filter is analyzed, with the as­sumption that the deposits are suffi­ciently homogeneous that one can take the analyzed portion as represen­tative of the entire sheet. To test this assumption, 10 portions were taken at random from each of five samples, and triplicate analyses were run on each portion. The portions of one sam­ple are unrelated to the portions of any other sample. Hence "portions" is a subsidiary factor nested within the main factor "samples". Both are ran­dom factors, since the portions and the samples were both chosen at ran­dom from among a large number of possibilities.

The anova table is shown in Table II (the data are too extensive to be re­produced here; copies are available from the author). The among portions within samples MS is tested over the replication or residual error, and then the among samples MS is tested over the level directly below, the among portions within samples MS. Both Fs ratios are highly significant; hence, standard deviations are calculated at each level. They are s = ±0.07, spcs = ±0.06, and s s = ±0.65. One can conclude that the variation among portions of a sample, while significant relative to the replication error of the x-ray method, is still much smaller than the variation among samples and that the single portion technique is thus adequate for distinguishing

692 A · ANALYTICAL CHEMISTRY, VOL. 49, NO. 8, JULY 1977

Table II. Nested Anova of X-ray Analyses of Atmospheric Particulate Samples (3)

Anova table

Source of variation E(A*S)

Among samples (S) 4 47.5815 11.8954 6 8 5 · · · a1 + 3 nP, s * + 30 as'

Among portions 45 0.7813 0 0174 3 . 5 ' · · a2 + 3 o> s2

within samples (P C S)

Residual error 100 0.4976 0.0500 <r2

Total 149 48.8604

Page 3: Analysis of Variance in Analytical Chemistry

among samples displaying this range of variability.

Factors of Equal Rank. In many cases, both of the factors contribute at the same level, tha t is, neither ranks below (within) the other as in the heir­archical model. The data in Table III illustrate the two-way anova. All values of each factor are applied to all values of the other factor; hence, a nested model would be inappropriate.

A new source of variation has ap­peared in this model, interaction be­tween the main factors. Interaction occurs when the effect of factor A is dependent on the level of factor Β (and vice versa). In this example, a significant interaction would mean that the difference in response be­tween the new and the old electrodes

would be affected by the sample con­centration. In fact, the interaction mean square is not significant relative to the replication error. The variation among samples adds a highly signifi­cant component to the total variabili­ty. On the other hand, the variation between electrodes is not significant.

Since both factors are Model I (spe­cific t rea tments controlled by the ex­perimenter and not drawn a t random), no added variance components are calculated. The significant factor, samples, clearly could give a mean square as large or as small as the ex­perimenter wished, simply by choos­ing a large or a small range of concen­trations. This points out an impor tant consideration in designing experi­ments: the range of a t rea tment factor

must be large enough to allow its ef­fect—if any—to be observed. T o have limited the concentrations to, say, 1.99, 2.00, and 2.01 would have made it impossible to observe the among-sample effect.

With a significant sample effect, one would then test for differences among the sample means. The form of the tests depends on whether they were planned before gathering the data or not. In a planned comparison the test is carried out regardless of whether a difference exists between two or more means. This provides a higher confidence level (for a given critical value of the test statistic) than for an unplanned comparison, where our tendency would be to test only those means which really appear to be different.

In the present example we might reasonably have planned to compare the first sample with the second with the third and fourth samples com­bined. The totals for each of these groups are obtained (Σ in Table III), the squared sums divided by the num­ber of measurements in each group are added, and the square of the grand total is divided by the total number of measurements to obtain the com­parison SS. The comparison has two degrees of freedom and is highly sig­nificant. The second comparison in Table III shows tha t the means of the measurements on samples 3 and 4 are not significantly different.

Many other techniques have been developed for planned and unplanned comparisons—see any of the texts in the bibliography for more informa­tion.

Anova Without Repl icat ion. If the data do not contain replications, it is impossible to obtain separate esti­mates of the residual and interaction mean squares. Table IV shows the anova of a study of the amount of dis­solved organic mat ter in five molecu­lar weight ranges (labeled A - E for simplicity) a t five locations along a river (6). The residual error and inter­action are combined, and the main factors tested over their pooled (or joint) mean square. Both main factors contribute significantly to the overall variability.

Interaction would mean tha t the relative order of the amounts in each fraction would differ from station to station. While the present da ta are not conclusive, it is suggestive t ha t frac­tions A and Β are very high a t stations 1 and 2 and quite low relative to frac­tions D and Ε a t the remaining sta­tions. Also, the s tandard deviation for the pooled error and interaction, (0.1418)1/2 = ±0.377, represents a 35% rsd (compared to the grand mean of the data) , considerably higher than the measurement rsd values reported

694 A · ANALYTICAL CHEMISTRY, VOL. 49, NO. 8, JULY 1977

Table III. Two-Way Anova of Ion-Selective Electrode Measurement of Calcium (4, 5)

Factor B: sampU mmol/L Ca

Factor A: electrodes Factor B: sampU

mmol/L Ca ?s,

New electrode Old electrode Σ

0.500 0.501,0.497 0.502,0.500,0.506, 3.010 0.504

1.000 0.997,1.004 1.013,1.009,0.991, 6.015 1.001

2.000 1.993,2.001 2.001,2.005,1.997, 12.010 2.013

2.000 2.001,1.999 2.001,2.007,2.001, 12.018 2.009

Anova table

Source of variation df SS MS Fs E(/VfS)a

Between 1 0.000114 0.000114 3.26 ns σ2 . 10.7 electrodes ΣαΑ

2

Among 3 10.1312578 3.37709 96000*** σ2 + 6 Σ β2 + samples 0.33 Σ (αβ)2

Interaction 3 0.0000116 0.000004 < 1 ns σ2 + 2.66 (electrodes

samples)

Σ (α/3)* (electrodes

samples) Residual 16 0.0005605 0.00003503 σ2

error Total 23 10.13119440

Comparisons of means

Samples 0.500 vs. 1.000 vs. 2.000 (combined): SS = 10.0912 MS = 5.0456 Fs = 5.0456/0.0003503 = 14400***

Samples 2.000 vs. 2.000: SS = 0.0000053 MS = 0.0000053 Fs = 0.0000053/0.0003503 < 1ns

a Whenever the subclass sizes are not identical, the E{MS) must be weighted. See G. W. Snedecor and W. G. Cochran, "Statistical Methods", 6th ed.. Chap. 16, Iowa State Univ. Press, Ames, Iowa, 1967. It is generally best to avoid the complications of unequal subclass sizes in this and other anova models. See also J. Mandel and R. C. Paule, Anal. Chem., 42, 1194(1970).

Page 4: Analysis of Variance in Analytical Chemistry

independently by the author. He notes a relationship between the amounts of material in fraction A and the salin­ity of the river water being sampled, which would be a plausible chemical explanation for a significant statistical interaction.

Sometimes it is not necessary to have replications for all combinations of factors to test for the presence of interaction between factors. An exam­ple of this is shown in Table V, ex­tracted from a study of the determina­tion of sub-ppm quantities of nitrate (7). By use of the raw data, it is con­cluded that both the dithionate and nitrate concentrations contribute sig­nificantly to the variability in this set of data.

The absorbance data were then nor­malized by dividing 10 times the ab­sorbance by the nitrate concentration (Table V, B). The dithionate concen­tration is still a significant factor, but now the nitrate concentration no long­er contributes significantly to the vari­ability, which means that the relative response of the method does not de­pend on the amount of nitrate being measured. Such transformations of ex­perimental results can often allow in­formation to be extracted which oth­erwise would be obscured.

What of the interaction of the fac­tors? An independent estimate of the •MSresjduai is available using the results labeled C in Table III of the article (7, 8). On the normalized basis, it amounts to 0.0077 (0.0044 if the data at 0.25 ppm are left out as much less precise than at the other concentra­tions). The anova MSerror+inteI.action is not larger than this; therefore,we con-

elude that there is no significant inter­action between the concentrations of NO a - and S204

2- .

Multi factor Analysis of Var iance

When several factors are of interest, then both the design and interpreta­tion of anova become more complicat­ed. With three factors A, B, and C, we will have the following MS to calculate and test: residual, interactions A Χ Β X C, A Χ Β, A X C, and Β Χ C, and main factors A, B, and C. The experi­ment would require 54 measurements if two replications are to be made and each factor is to be represented at three levels. A comparable four-factor system has 11 interactions and would require 162 measurements.

It is very difficult in such cases to ensure that the measurements are car­ried out in a truly random sequence. Doing part of the experiment on an­other day or assigning portions to each of several chemists or laboratories may introduce an unwanted factor (days, chemists, laboratories) which could confound some of the tests to be made on the data.

It is essential here to use a suitable experimental design which will see to it that a less important effect (such as a n A x B x C o r A X B X C X D inter­action) is confounded rather than a main effect. In fact, the proper frac­tional design may well yield the de­sired information with fewer measure­ments. The reader is referred to the chapter by Tingey (9), as well as to books on design of experiments (such as 10-12) for more information on this topic.

Assuming all the measurements can be carried out in a randomized fashion (which has become much more feasi­ble for a multifactor design due to re­cent advances in automated control of instrumentation), there remains the problem of interpreting the anova. The Fs ratios are calculated different­ly depending on whether a factor is Model I or II, because the expected mean squares are different. For exam­ple, in a four-factor anova with A, B, and C fixed treatments and D a ran­dom factor, E(MSc) = σ2 + nabaCO

2

+ nabdac2 but E(MSO) = σ2 + nabca-D2 (where η is the number of replications, and a, b, c, and d are the number of levels of the respective fac­tors). The proper test for main effect C would thus be Fs = MSC/MSCD while for main effect D it would be Fs

= MSD/MSresidual· Further treatment of the complica­

tions of multifactor anova is impossi­ble due to the limitations of space. A good discussion is found in the text by Sokal and Rohlf (13a). Assumptions and Limitations of Anova

The validity of an anova will be compromised if certain underlying as­sumptions are not met. It is tempting to analyze a set of data without con­sidering the appropriateness of the analysis; the numbers by themselves cannot resist the algebraic manipula­tions of anova and will inevitably yield mean squares, .F-ratios and added variance components. The best pro­tections against erroneous conclusions are to be sure that the statistical tech­nique has been used correctly and to substantiate the results of the statisti­cal analysis with a rationale based on chemistry and common sense.

Prior to the discussion of the basic assumptions of anova, reference will be made to the methods of Anscombe and Tukey (14) for examination of re­siduals in experimental data and to the graphical techniques described by Feder (15). Lack of space precludes a more detailed exposition of these valuable alternatives to anova.

Random Sampling. The data must be collected in a random sequence, un­related to their positions in the final tabulation prepared for anova. If a treatment is applied at several levels, the order of application must be ran­domized, rather than obtaining all the replicates at one treatment level, then at the next level, and so on. If there are several factors, then the combina­tions of treatments must be used in a random sequence. Lack of random sampling may result in the confound­ing of the analysis by an unrecognized experimental factor, as well as in bi­ased estimates of variance compo­nents.

ANALYTICAL CHEMISTRY, VOL. 49, NO. 8, JULY 1977 · 697 A

Table IV. Two-Way Anova of Dissolved Organic Matter in River Water (6)

Factor A: sampling stations

Factor 13: MW traction 1 2 3 4 5

A 1.65 1.18 0.41 0.30 0.25 Β 1.47 1.06 0.44 0.30 0.23 c 1.35 0.76 0.59 0.26 0.33 D 1.47 1.18 2.00 1.94 1.33

Ε 2.64 1.35 1.82 1.21 1.35

Anova table

Source of variation df ss MS Fs E(MS)

Among stations 4 3.1451 0.786 5.54** σ2 + 1.25 Σ Among MW 4 5.1644 1.291 9 .11* * * σ 2 + 1 . 2 5 Σ β2

fractions Residual error & 16 2.2687 0.1418 σ2 + 0.0625

interaction (αβ)ζ

Total 2 4 10.5782

Page 5: Analysis of Variance in Analytical Chemistry

T h e best way to check if this re­qui rement has been met is to scruti­nize the experimental procedure for factors which might not have been varied in a random fashion. The only solution to lack of randomness is to redesign the experiment. This may in­volve a new sequence for the measure­ments , or it may mean increasing the number of factors to be tested, con­verting an unsuspected nonrandom effect into a t r ea tment effect factor.

Independence of Random Errors. The random errors must not, in fact, vary from measurement to measure­ment according to some pat tern. If the da ta are arranged in the sequence in which they were collected, the residual error, t, should vary randomly from one measurement to the next. A pat­tern of regular cycling or of drift from low to high or vice versa indicates the presence of a t ime-dependent factor (such as a t empera ture drift or change in line voltage) which was not recog­nized when the experiments were car­ried out. This may make the F tests in anova misleading, especially if the ampl i tude of the drift or shift is large. Lack of independence of random er­rors can be detected by run tests (13b, 16a). If the residual errors are not in­dependent , then the alternatives are to improve the experiment to remove the source of dependence entirely ( thermostat the appara tus , use a regu­lated power supply) or to redesign the experiment so tha t the t rea tments of interest are applied randomly with re­gard to this factor (randomized blocks design).

Homogeneous Sample Variances . In a one-way anova, the variance s,2

within any group ί is an est imate of the residual error variance σ2. These i est imates are pooled in obtaining MSwlthjn in the anova procedure. If, however, the variances are not all esti­mates of the same population vari­ance, then the pooling is invalid. The significance of an F test performed using this MSwjthin will then be differ­ent from the significance adopted in selecting the critical value of F. Lack of homogeneity of the sample vari­ances is called heteroscedasticity.

The change in significance of the F test (change in the value of type I error, a) can be substantial . Brown and Forsythe (17), for example, report percentages of rejections of t rue hy­potheses as high as seven times the nominal significance levels with a ninefold range in within-group vari­ances. The largest deviations from the desired significance levels occur when the groups vary in size and especially when the group sizes and variances are directly or inversely correlated. Clear­ly one should keep the group sizes constant if heteroscedasticity is a pos­sibility in an experiment.

A simple test of the uniformity of the variances is to calculate the ratio F m m of the largest within-group vari­ance to the smallest one. This is then compared to the tabulated value of F«{a,n-i\- Clearly, if F m a x is not signif­icant, then all the within-group vari­ances are not significantly different from each other.

In many cases, there is good reason to expect the variances to be different. In counting experiments the variance is equal to the number of counts. Data collected over a wide range of total counts will often be heteroscedastic. Fortunately, in this case taking the square roots of the original data makes the variances homogeneous, al­lowing anova to be carried out.

Arithmetic transformations are fre-

quently a solution to this problem. The nature of the techniques or mate­rials involved in the experiment should serve as a guide to the appro­priate transformation. In some cases, the uniformity of the variances may be improved by altering the experi­mental method. If a wide range of con­centrat ions mus t be measured, per­haps the samples could be diluted to different extents in the preparat ion for the measurement , so tha t the nu­merical values of the observations fall into a relatively narrow range.

It may, in fact, be best to use a non-parametr ic analysis in place of anova, as will be described later in this arti­cle.

Gaussian Distr ibut ion of Ran­dom Errors. Many data follow the

698 A · ANALYTICAL CHEMISTRY, VOL. 49, NO. 8, JULY 1977

Table V. Effect of Dithionite Concentration on Determination of Nitrate ( 7 )

A. R a w a b s o r b a n c e da ta

Factor B: [ S 2 0 4

2 - ] . M

Factor A: [NO3 ], ppm Factor B:

[ S 2 0 42 - ] . M 0.25 0.90 2.25 4.50

0.023 0.026 0.104 0.261 0.496

0.046 0.029 0.110 0.264 0.495 0.092 0.020 0.100 0.240 0.456

Anova table Source of

B(MS) variation df ss MS Fs B(MS)

[NO3- ] 3 0.36399 0.12133 1 5 0 0 * * * σ2+Σα2

[S2042-] 2 0.000990 0.000495 6 . 1 * σ2 + 2Σβ2

Residual 6 0.000485 0.0000808 σ2 + 0.125 error and Σ (αβ)2

interaction Total 11 0.36547

B. 10 X a b s o r b a n c e / [ N O Γ] Factor B:

|S2042~J. M

Factor A : [NO3"""], ppm Factor B: |S204

2~J. M 0.25 0.90 2.25 4.50

0.023 1.16 1.22 1.17 1.10 0.046 1.04 1.16 1.16 1.10 0.092 0.80 1.11 1.07 1.01

Anova table Source of variation df SS MS Fs E(MS)

[NO3- ] 3 0.0473 0.01576 3.51 ns σ2 + Σα2

[s2o42- ] 2 0.0578 0.0289 6.45* σ2 + 2 Σ/32

Residual error and 6 0.0269 0.00448 σ2 + 0.125 Σ interaction (αβ)2

Total 11 0.1320

Page 6: Analysis of Variance in Analytical Chemistry

gaussian distribution sufficiently closely for anova to be feasible. The residual error is normally gaussian since it is the combination of many small sources of error. Thus, even though the individual sources of resid­ual error may be nongaussian, their combination is likely to be gaussian (Central Limit theorem).

If the distribution of the random er­rors is significantly different from the gaussian, then the significance (a) and/or the power (β) of the F test may be altered. Usually this occurs when one or both tails of the experimental distribution are longer or shorter than the gaussian. This can be detected by calculating the skew (third moment) and kurtosis (fourth moment) of the data, provided that the set of data is large. The Kolmogorov-Smirnov test (13c, 16b) may be used to test sets of data containing as few as 15 values and allows testing the experimental distribution against any hypothetical distribution.

This simple test is especially useful in deciding whether to use a transfor­mation of the experimental data, since it can compare the data with a log gaussian or other transformed gauss­ian distribution. The concentrations of trace constituents of a sample often

follow a log-gaussian distribution, and the logarithms of the measurements can then be used in anova.

Other solutions for the situation of nongaussian data are use of a larger sample size, use of a more robust sta­tistic than F (18), or use of a nonpar-ametric technique of analysis.

Nonparametric Techniques If the assumptions of homogeneity

of variances and gaussian distribution of random errors cannot be met, then it may be best to use a nonparametric test. These tests are also known as rank-order and as distribution-free tests since they make use of the ranks of the items rather than their numeri­cal values and therefore should be in­sensitive to the shape of the error dis­tribution.

An additional advantage of many of these tests is that they are simple to carry out. The Wilcoxon paired comparison test described below can be completed in a few minutes by use of hand calculations. On the other hand, by only using the ranks of the data, information is lost and the anal­ysis is less complete than with a para­metric test. The latter is usually more

powerful (smaller type II error) than its nonparametric counterpart. Hence, it generally is best to use the paramet­ric anova if the assumptions can be met, while the nonparametric test should be used if there is any serious question about the assumptions.

Many useful rank-order tests have been devised (16, 19), and it is only because of limitations of space that just two such methods will be de­scribed in this article.

Paired comparisons (two-treatment randomized blocks) are widely appli­cable in analytical chemistry. They are used for comparing new and reference methods of analysis (as in the cal­cium-ion electrode example) and for checking the identity of two materials. Often the data will cover a wide nu­merical range, leading to a significant range of variances in many cases. The Wilcoxon Signed-Ranks test (16c) is applicable to these situations. An ex­ample involving two methods for de­termination of specific surface area (20) is given in Table VI. The pairs of data are tabulated, and the differences between methods determined for each pair. The differences are ranked from the lowest (rank 1) to highest (rank n) without regard to sign. The ranks are then given + or — signs according to the sign of the difference. The posi­tive and negative ranks are summed separately, and the lower of the sums is compared with the tabulated statis­tic w. Rejection of the hypothesis of no difference between the two groups occurs if ws is less than wn. Note that tied differences are given an average rank. In the example, one concludes that there is no significant difference between the two methods of measure­ment of surface area.

If there are several levels or kinds of treatments rather than just two, the randomized blocks method of Fried­man (13d, 16d) may be used. An ex­ample from a study (3) of iron in at­mospheric particulate matter as a function of wind velocity at various locations is given in Table VII. In this procedure the values are ranked with­in each block (location), and the ranks are then summed over all the blocks for each treatment. The squared rank sums are then inserted into the equa­tion

X2=[12/aö(a+l)][f ;( i :«)2 l

- 3ό(α + 1)

where a is the number of different treatments, and b is the number of blocks. In this example X2 is 30. X2 is then compared with χ2 for α — 1 de­grees of freedom. Since x'2o.oo5|4) = 14.9, the hypothesis of no treatment effect can be rejected with 99.5% con­fidence. The conclusion, then, is that

ANALYTICAL CHEMISTRY, VOL. 49, NO. 8, JULY 1977 · 699 A

Table VI. Wilcoxon Signed-Ranks Test for Paired Comparison Determination of Surface Areas (24)

Methods

Difference Porosimetry Gas adsorption Difference Rank

74.1 70.0 +4.1 + 16 57.0 54.1 +2.9 + 15 17.2 18.4 -1 .2 - 1 3 14.0 14.4 -0 .4 -10.5 8.0 7.5 +0.5 + 12 0.48 0.52 -0 .04 -4 .5 0.13 0.15 -0.02 - 3 1.11 1.10 +0.01 +2 1.31 1.27 +0.04 +4.5 1.05 1.28 -0.23 - 7

26.4 26.8 -0 .4 -10.5 19.1 19.1 0 + / -0 .5 4.82 4.98 -0.16 - 6 5.98 5.70 +0.28 +8 9.79 9.50 +0.29 +9 7.21 5.62 + 1.59 + 14

Absolute sum of negative ranks 55 Sum of positive ranks

Ws = 55 ns

81

Wo.20[16] = 5 1 Wo 05(16| = 3 6 W0.01I16] • 24 (23)

Page 7: Analysis of Variance in Analytical Chemistry

Table VII . Fr iedman Rank-Sum Test: Effect of Wind Velocity on Iron Concentration Factor B:

wind Factor A: sampling locations

velocity, knots 1 2 3 4 5 6 7 8 9 10 11

<5 1.17 0.89 0.88 0.91 1.25 1.17 0.95 0.86 0.89 0.69 1.00 6 0.99 0.68 0.98 0.67 1.24 0.99 0.81 0.63 0.65 0.92 0.91

7-8 0.82 0.77 0.80 0.59 1.22 0.95 0.73 0.63 0.59 0.55 0.88 9-10 1.03 0.80 0.97 0.68 1.65 0.92 0.74 0.65 0.45 0.45 0.93 > 11 0.87 0.74 0.60 0.50 1.00 0.87 0.43 0.32 0.26 0.29 0.72

Ranks within columns Σ Β (Σ/?)2

5 5 3 5 4 5 5 5 5 4 5 51 2601 3 1 5 3 3 4 4 2.5 4 5 3 37.5 1406 1 3 2 2 2 3 2 2.5 3 3 2 25.5 650 4 4 4 4 5 2 3 4 2 2 4 38 1444 2 2 1 1 1 1 1 1 1 1 1 13 169

X2 = [12/ab(a + ' Ι ) ] [ Σ ( Σ « ) ] - 3 6 ( a - r - 1 ) = 30

Total 6270

X2o.oo5[4] = 14.9

there is a significant variation of the iron concentration with wind velocity.

Conclusions This article has described the meth­

ods, advantages, and limitations of analysis of variance. Much has had to be left out, for example, the related pointwise variance analysis (21, 22). alternatives for the estimation of vari­ation, such as the duplicate analysis procedures of Thompson and How-ar th (23), likewise could not be includ­ed. I t is nevertheless hoped t ha t the reader will have gained an idea of how anova can be applied to problems in analytical chemistry. Presently, the only areas of regular application are in interlaboratory testing of methods (24) and in testing for significance in regression analysis (13e). The author is convinced tha t a wider awareness of the purpose and value of anova will re­sult in much more frequent use of this statistical technique.

Acknowledgment The author expresses his apprecia­

tion to the following persons who pro­vided encouragement, experimental data, and comments on the manu­script: T. F. Christiansen, L. J. Cline

Love, G. W. Ewing, L. N. Klat t , E. Krause, C. Orr, D. G. Pachuta , R. G. Smith, and especially W. J. Sullivan.

References (1) W. J. Youden, "Statistical Methods for

Chemists", Wiley, New York, N.Y., 1951. (2) D. G. Pachuta, PhD dissertation, Seton

Hall University, South Orange, N.J., 1976.

(3) W. J. Sullivan, ibid. (4) T. F. Christiansen, private communica­

tion, 1976. (5) T. F. Christiansen, J. E. Busch, and S.

C. Krogh, Anal. Chem., 48,1051 (1976). (6) R. G. Smith, Jr., ibid., ρ 74. (7) D. R. Senn, P. W. Carr, and L. N.

Klatt, ibid., ρ 954. (8) L. N. Klatt, private communication,

1976. (9) F. H. Tingey, Treatise Anal. Chem.,

Parti, 10,6405(1972). (10) V. L. Anderson anel R. A. McLean,

"Design of Experiments", Marcel Dek-ker, New York, N.Y., 1974.

(11) O. L. Davies, Ed., "Design and Analy­sis of Industrial Experiments", 2nd ed., Hafner, New York, N.Y., 1971.

(12) W. G. Cochran and G. M. Cox, "Ex­perimental Designs", 2nd ed., Wiley, New York, N.Y., 1957.

(13) R. R. Sokal and F. J. Rohlf, "Biome­try", Freeman, San Francisco, Calif., 1969: a) Chaps. 11 and 12; b) pp 624ff; c) pp 571ff; d) pp 398-9; e) Chap. 14.

(14) F. J. Anscombe and J. W. Tukey, Technometrics, 5,141 (1963).

(15) P. I. Feder, ibid., 16,287(1974). (16) W. J. Conover, "Practical Nonparam-

etric Statistics", Wiley, New York, N.Y.,

1971: a) pp 349ff; b) Chap. 6; c) pp 203-15, 383; d) pp 264ff.

(17) M. B. Brown and A. B. Forsythe, Technometrics, 16, 129 (1974).

(18) J. N. Arvesen and Τ. Η. Schmitz, Bio­metrics, 26,677 (1970).

(19) M. Hollander and D. A. Wolfe, "Non-parametric Statistical Methods", Wiley, New York, N.Y., 1973.

(20) J. T. Brock and C. Orr, Anal. Chem., 44,1534(1972).

(21) L. Meites, Anal. Chim. Acta, 74, 177 (1975).

(22) A. F. Isbell, Jr., R. L. Pecsok, R. H. Davies, and J. H. Purnell, Anal. Chem., 45,2363(1973).

(23) M. Thompson and R. J. Howarth, An­alyst, 101,690(1976).

(24) ASTM Standard Recommended Practices E-177 and E-180, ASTM, Phil­adelphia, Pa., 1976.

Bibliography R. R. Sokal and F. J. Rohlf, "Biometry",

Freeman, San Francisco, Calif., 1969 (particularly recommended for the clari­ty of the instructions for carrying out the various models of anova).

G. W. Snedecor and W. G. Cochran, "Sta­tistical Methods", 6th ed., Iowa State Univ. Press, Ames, Iowa, 1967.

R.G.D. Steel and J. H. Torrie, "Principles and Procedures of Statistics", McGraw-Hill, New York, N.Y., 1960.

H. Scheffe, "The Analysis of Variance", Wiley, New York, N.Y., 1959.

R. A. Fisher, "Statistical Methods for Re­search Workers", 14th ed., Hafner, New York, N.Y., 1970.

O. L. Davies and P. L. Goldsmith, Eds., "Statistical Methods in Research and Production", 4th ed., Longman, London, England, 1976.

H. R. Lindman, "Analysis of Variance in Complex Experimental Designs", Free­man, San Francisco, Calif., 1974.

Roland F. Hirsch is an associate pro­fessor of chemistry at Seton Hall Uni­versity. He was educated at Oberlin College and the University of Michi­gan. His interests in research and teaching include chromatography, ion-sensitive electrodes, statistics, and x-ray spectrometry. This article was conceived while he was on sabbatical leave at the Inorganic Chemistry Lab­oratory, Oxford University, where he was a beneficiary of the resources of the Radcliffe Science Library and Blackwell's Bookstores.

700 A · ANALYTICAL CHEMISTRY, VOL. 49, NO. 8, JULY 1977