analysis of variance in analytical chemistry

Roland F. Hirsch Chemistry Department Seton Hall University South Orange, N.J. 07079

Report

Analysis of Variance in Analytical Chemistry

Analytical chemists usually work with techniques and methods which contain many sources of error. If the total variability in a particular case is higher than desired, then the significant sources of error must be identified and controlled. Analysis of variance is a statistical technique for estimating the importance of one or more factors suspected of contributing significantly to the total uncertainty in a given situation.

Although analysis of variance (anova) is a well-established technique [nearly half of Youden's 1951 text (1) is devoted to it] , it seems to be used quite rarely by chemists. This article is intended to encourage more regular application of anova by showing how it works using specific cases in analytical chemistry. T h e actual computations will not be described since they are presented in many statistics texts, several of which are listed at the end of this article. Rather , the emphasis will be on choice of the proper model for the situation, interpretation of the anova, and the advantages and limitations of the technique.

In anova the primary purpose is to test the hypothesis tha t a factor does not contr ibute added variability to a set of data beyond tha t caused by all other factors (the residual error). If this hypothesis is found invalid, then the size of the contribution from this factor can be est imated and appropriate steps can be taken in succeeding experiments to keep it under control.

The usual procedure for anova involves identifying the factor(s) to be studied, designing and carrying out experiments in which data are collected for at least two levels of each factor, apportioning the variance of the entire set of the da ta among the sources of variation, and interpreting the results of these computations.

Single-Factor Analysis of Variance How anova works can best be un

derstood through an example. In the development of a rapid, fully automated atomic absorption analysis sys

tem using the Delves Cup technique of sample introduction (2), it was realized t ha t the potentially high precision of automated sampling and data acquisition could not be at tained if one could not count on obtaining reproducible results from one sample cup to the next. A test was designed to estimate the contribution of be-tween-cup variability to the overall precision. Ten cups were selected at random, and aliquots of a s tandard solution of lead were run until nine replications had been made with each cup. T h e signal observed consisted of an absorption peak caused by the lead in the sample being volatilized into the light beam. The raw data and anova table are shown in Table I.

Each of the items in the anova table will now be described. The sources of variation are the specific factor or factors being studied (here the cups) and

all other sources, pooled and called the residual (or replication or measurement) error. The degrees of freedom {df) are, as customary, one less than the number of groups, here 10 — 1 = 9, and the sum of (one less than the number of data within each group), here 8 X 10 = 80. T h e total sum of squares (SS) is the total of the squared deviations of the individual values from the grand mean, ( Y:j — V)2. The SSamong cups compares the mean value foi^each^cup with the grand mean, (Y} — V)-, and SSr<,sidUai compares the individual readings for a cup with the mean value for that cup, (Yij — Yj)2. The mean squares (MS) are the ratios of sums of squares to degrees of freedom. The experimental F ratio (Fs) is the MS a m u n g cups/ MSreBiduai· Finally, the expected values of the mean squares \E(MS)\ are given in the last column, with a-

Table 1. Automated Delves Cup Determination of Lead By Use of Peak Areas ( 2 )

Cup no.

1 2 3 4 5 6 7 8 9 10

3.104 3.126 3.084 3.060 3.196 3.120 2.886 2.982 3.252 3.099

3.055 2.823 2.953 2.983 2.785 3.077 2.794 3.110 2.937 3.016

2.908 2.758 2.896 2.940 2.902 2.926 2.719 2.933 2.933 3.020

3.053 2.809 2.811 2.782 2.958 2.944 2.677 2.909 2.944 2.972

2.893 2.667 2.915 2.844 2.935 3.031 2.752 2.984 2.781 3.036

2.864 2.888 2.896 3.010 2.825 2.899 2.889 2.943 3.073 2.880

2.919 2.831 2.823 2.857 2.863 2.928 3.034 2.736 2.944 3.013

2.760 2.843 2.919 2.974 2.893 2.770 2.846 2.836 2.859 2.901

2.992 3.019 3.031 2.952 2.890 2.880 2.850 2.855 2.975 2.971

A n o v a tab le

Source of variation df SS MS Fs E(/WS)

Among cups 9 0.192227 0.0213585 1.827 ns <T2 + 9ac2

Residual error 80 0.935060 0.0116882 a2

Total 89 1.127287

ANALYTICAL CHEMISTRY, VOL. 49, NO. 8, JULY 1977 · 691 A

representing in this example the variance of replication and ac

2 the variance among cups.

The Fs value is compared with critical values of F. If Fs is greater than a tabulated F, then the hypothesis of no added variance due to the specific factor [H0: ac

2 = 0] is rejected with the designated confidence level. By convention, if Fs < Fo.os (95% confidence), then the ratio is considered not significant, and the hypothesis is accepted. Fs is labeled "ns" in this case. If F0.05 ^ Fs < F0.o\, a single asterisk is placed next to the value of Fs, while F0.01 ^ Fs < F0.oo\ calls for two asterisks, and ivooi < Fs calls for three asterisks, as a shorthand for denoting the significance of the Fs value.

In the Delves Cup analysis example, the relevant critical value of F is ^o.05[9,80) = 1-99. In this case, Fs is not significant; therefore, one can say that the variation among cups is not an important source of error.

Whether Fs is significant or not, it is possible to use the MSWjth;n as an estimate, s2 [Roman letters represent estimates of the parameters symbolized by the corresponding Greek letters (e.g., s is an estimate of σ)], of the residual error, σ2. The value of s2 is 0.01169, s is 0.108, and the relative standard deviation is 3.7%. No sc is calculated since the MS a m o n g cups is not significantly greater than that expected by chance in the absence of a contribution from the cups; ac undeniably is finite, but it cannot be detected in the present case because of its small size.

Random and Nonrandom Factors The initial example illustrates the

single factor anova and will serve as the prototype for the computationally more complicated multifactor anovas. Before considering situations where more than just one factor is singled out for attention, however,· we need to distinguish between two kinds of

factors. One is the random error factor, such as in the example just described. The other is the treatment effect factor, where the values of the factor are not selected at random, but rather where the choice is controlled by the experimenter.

Suppose, for example, that we were interested in whether the material from which the cups were made was important. Instead of choosing a few cups at random from an almost infinite number, we would select from the limited number of materials suitable for this technique. If these cups differ significantly, it will be a determinate— not a random—source of error, as the differences which depend on the composition of the cups will be consistent from one experiment to another. In contrast, the fact that cup 9 usually gives a higher result than cup 7 in the experiment shown in Table I tells us nothing about their relationship in a later experiment with a new set of randomly selected cups.

In the random effects case, often called Model II or infinite population, the individual values Yi; = μ + iy- + Αι, where μ is the mean value for the population, ey- is the residual error (standard deviation σ) contribution for this measurement j on cup i, and Ai is the among-cups random error (standard deviation oc) contri. ution for cup i.

In the treatment effects case, often called Model I or finite population, the individual readings Y,y = μ + ey + a,·, where a; is the effect of applying treatment i (such as a particular composition of cup). There is no point in calculating a standard deviation here, since it is not an estimate of a parameter ac, but rather the result of the choice of materials for this experiment. Another experiment with a different group of materials would yield a different "standard deviation" at this level; in the random model, however, a second group of cups would provide another estimate of the same

quantity, cc. In the Model I anova, a significant Fs suggests instead that one test the mean values of each group (cup compositions determine the groupings) to see which are significantly different from the rest; an example will be given shortly.

Two-Factor Analysis of Variance When the contributions to variabili

ty from two factors are to be tested simultaneously, it is necessary to determine the relationship of the second factor to the first. There are two possibilities: an heirarchical (commonly called nested) arrangement or an independent arrangement of the variables.

Nested Factors. Suppose a randomly distributed factor contributes different levels for each group or treatment value of the second factor— the values of the first factor in one group bear no relationship to the values in any other group. The first factor is nested within the second factor.

Consider the following example (3). Several samples of atmospheric particulate matter were collected on filter paper by use of the high-volume technique. The samples were analyzed by x-ray emission spectrometry for several trace metals. Normally, a single small portion (about 5 cm2) of each 500-cm2 filter is analyzed, with the assumption that the deposits are sufficiently homogeneous that one can take the analyzed portion as representative of the entire sheet. To test this assumption, 10 portions were taken at random from each of five samples, and triplicate analyses were run on each portion. The portions of one sample are unrelated to the portions of any other sample. Hence "portions" is a subsidiary factor nested within the main factor "samples". Both are random factors, since the portions and the samples were both chosen at random from among a large number of possibilities.

The anova table is shown in Table II (the data are too extensive to be reproduced here; copies are available from the author). The among portions within samples MS is tested over the replication or residual error, and then the among samples MS is tested over the level directly below, the among portions within samples MS. Both Fs ratios are highly significant; hence, standard deviations are calculated at each level. They are s = ±0.07, spcs = ±0.06, and s s = ±0.65. One can conclude that the variation among portions of a sample, while significant relative to the replication error of the x-ray method, is still much smaller than the variation among samples and that the single portion technique is thus adequate for distinguishing

692 A · ANALYTICAL CHEMISTRY, VOL. 49, NO. 8, JULY 1977

Table II. Nested Anova of X-ray Analyses of Atmospheric Particulate Samples (3)

Anova table

Source of variation E(A*S)

Among samples (S) 4 47.5815 11.8954 6 8 5 · · · a1 + 3 nP, s * + 30 as'

Among portions 45 0.7813 0 0174 3 . 5 ' · · a2 + 3 o> s2

within samples (P C S)

Residual error 100 0.4976 0.0500 <r2

Total 149 48.8604

among samples displaying this range of variability.

Factors of Equal Rank. In many cases, both of the factors contribute at the same level, tha t is, neither ranks below (within) the other as in the heirarchical model. The data in Table III illustrate the two-way anova. All values of each factor are applied to all values of the other factor; hence, a nested model would be inappropriate.

A new source of variation has appeared in this model, interaction between the main factors. Interaction occurs when the effect of factor A is dependent on the level of factor Β (and vice versa). In this example, a significant interaction would mean that the difference in response between the new and the old electrodes

would be affected by the sample concentration. In fact, the interaction mean square is not significant relative to the replication error. The variation among samples adds a highly significant component to the total variability. On the other hand, the variation between electrodes is not significant.

Since both factors are Model I (specific t rea tments controlled by the experimenter and not drawn a t random), no added variance components are calculated. The significant factor, samples, clearly could give a mean square as large or as small as the experimenter wished, simply by choosing a large or a small range of concentrations. This points out an impor tant consideration in designing experiments: the range of a t rea tment factor

must be large enough to allow its effect—if any—to be observed. T o have limited the concentrations to, say, 1.99, 2.00, and 2.01 would have made it impossible to observe the among-sample effect.

With a significant sample effect, one would then test for differences among the sample means. The form of the tests depends on whether they were planned before gathering the data or not. In a planned comparison the test is carried out regardless of whether a difference exists between two or more means. This provides a higher confidence level (for a given critical value of the test statistic) than for an unplanned comparison, where our tendency would be to test only those means which really appear to be different.

In the present example we might reasonably have planned to compare the first sample with the second with the third and fourth samples combined. The totals for each of these groups are obtained (Σ in Table III), the squared sums divided by the number of measurements in each group are added, and the square of the grand total is divided by the total number of measurements to obtain the comparison SS. The comparison has two degrees of freedom and is highly significant. The second comparison in Table III shows tha t the means of the measurements on samples 3 and 4 are not significantly different.

Many other techniques have been developed for planned and unplanned comparisons—see any of the texts in the bibliography for more information.

Anova Without Repl icat ion. If the data do not contain replications, it is impossible to obtain separate estimates of the residual and interaction mean squares. Table IV shows the anova of a study of the amount of dissolved organic mat ter in five molecular weight ranges (labeled A - E for simplicity) a t five locations along a river (6). The residual error and interaction are combined, and the main factors tested over their pooled (or joint) mean square. Both main factors contribute significantly to the overall variability.

Interaction would mean tha t the relative order of the amounts in each fraction would differ from station to station. While the present da ta are not conclusive, it is suggestive t ha t fractions A and Β are very high a t stations 1 and 2 and quite low relative to fractions D and Ε a t the remaining stations. Also, the s tandard deviation for the pooled error and interaction, (0.1418)1/2 = ±0.377, represents a 35% rsd (compared to the grand mean of the data) , considerably higher than the measurement rsd values reported


Table III. Two-Way Anova of Ion-Selective Electrode Measurement of Calcium (4, 5)

Factor B: sampU mmol/L Ca

Factor A: electrodes Factor B: sampU

mmol/L Ca ?s,

New electrode Old electrode Σ

0.500 0.501,0.497 0.502,0.500,0.506, 3.010 0.504

1.000 0.997,1.004 1.013,1.009,0.991, 6.015 1.001

2.000 1.993,2.001 2.001,2.005,1.997, 12.010 2.013

2.000 2.001,1.999 2.001,2.007,2.001, 12.018 2.009

Anova table

Source of variation df SS MS Fs E(/VfS)a

Between 1 0.000114 0.000114 3.26 ns σ2 . 10.7 electrodes ΣαΑ

2

Among 3 10.1312578 3.37709 96000*** σ2 + 6 Σ β2 + samples 0.33 Σ (αβ)2

Interaction 3 0.0000116 0.000004 < 1 ns σ2 + 2.66 (electrodes

samples)

Σ (α/3)* (electrodes

samples) Residual 16 0.0005605 0.00003503 σ2

error Total 23 10.13119440

Comparisons of means

Samples 0.500 vs. 1.000 vs. 2.000 (combined): SS = 10.0912 MS = 5.0456 Fs = 5.0456/0.0003503 = 14400***

Samples 2.000 vs. 2.000: SS = 0.0000053 MS = 0.0000053 Fs = 0.0000053/0.0003503 < 1ns

a Whenever the subclass sizes are not identical, the E{MS) must be weighted. See G. W. Snedecor and W. G. Cochran, "Statistical Methods", 6th ed.. Chap. 16, Iowa State Univ. Press, Ames, Iowa, 1967. It is generally best to avoid the complications of unequal subclass sizes in this and other anova models. See also J. Mandel and R. C. Paule, Anal. Chem., 42, 1194(1970).

independently by the author. He notes a relationship between the amounts of material in fraction A and the salinity of the river water being sampled, which would be a plausible chemical explanation for a significant statistical interaction.

Sometimes it is not necessary to have replications for all combinations of factors to test for the presence of interaction between factors. An example of this is shown in Table V, extracted from a study of the determination of sub-ppm quantities of nitrate (7). By use of the raw data, it is concluded that both the dithionate and nitrate concentrations contribute significantly to the variability in this set of data.

The absorbance data were then normalized by dividing 10 times the absorbance by the nitrate concentration (Table V, B). The dithionate concentration is still a significant factor, but now the nitrate concentration no longer contributes significantly to the variability, which means that the relative response of the method does not depend on the amount of nitrate being measured. Such transformations of experimental results can often allow information to be extracted which otherwise would be obscured.

What of the interaction of the factors? An independent estimate of the •MSresjduai is available using the results labeled C in Table III of the article (7, 8). On the normalized basis, it amounts to 0.0077 (0.0044 if the data at 0.25 ppm are left out as much less precise than at the other concentrations). The anova MSerror+inteI.action is not larger than this; therefore,we con-

elude that there is no significant interaction between the concentrations of NO a - and S204

2- .

Multi factor Analysis of Var iance

When several factors are of interest, then both the design and interpretation of anova become more complicated. With three factors A, B, and C, we will have the following MS to calculate and test: residual, interactions A Χ Β X C, A Χ Β, A X C, and Β Χ C, and main factors A, B, and C. The experiment would require 54 measurements if two replications are to be made and each factor is to be represented at three levels. A comparable four-factor system has 11 interactions and would require 162 measurements.

It is very difficult in such cases to ensure that the measurements are carried out in a truly random sequence. Doing part of the experiment on another day or assigning portions to each of several chemists or laboratories may introduce an unwanted factor (days, chemists, laboratories) which could confound some of the tests to be made on the data.

It is essential here to use a suitable experimental design which will see to it that a less important effect (such as a n A x B x C o r A X B X C X D interaction) is confounded rather than a main effect. In fact, the proper fractional design may well yield the desired information with fewer measurements. The reader is referred to the chapter by Tingey (9), as well as to books on design of experiments (such as 10-12) for more information on this topic.

Assuming all the measurements can be carried out in a randomized fashion (which has become much more feasible for a multifactor design due to recent advances in automated control of instrumentation), there remains the problem of interpreting the anova. The Fs ratios are calculated differently depending on whether a factor is Model I or II, because the expected mean squares are different. For example, in a four-factor anova with A, B, and C fixed treatments and D a random factor, E(MSc) = σ2 + nabaCO

2

+ nabdac2 but E(MSO) = σ2 + nabca-D2 (where η is the number of replications, and a, b, c, and d are the number of levels of the respective factors). The proper test for main effect C would thus be Fs = MSC/MSCD while for main effect D it would be Fs

= MSD/MSresidual· Further treatment of the complica

tions of multifactor anova is impossible due to the limitations of space. A good discussion is found in the text by Sokal and Rohlf (13a). Assumptions and Limitations of Anova

The validity of an anova will be compromised if certain underlying assumptions are not met. It is tempting to analyze a set of data without considering the appropriateness of the analysis; the numbers by themselves cannot resist the algebraic manipulations of anova and will inevitably yield mean squares, .F-ratios and added variance components. The best protections against erroneous conclusions are to be sure that the statistical technique has been used correctly and to substantiate the results of the statistical analysis with a rationale based on chemistry and common sense.

Prior to the discussion of the basic assumptions of anova, reference will be made to the methods of Anscombe and Tukey (14) for examination of residuals in experimental data and to the graphical techniques described by Feder (15). Lack of space precludes a more detailed exposition of these valuable alternatives to anova.

Random Sampling. The data must be collected in a random sequence, unrelated to their positions in the final tabulation prepared for anova. If a treatment is applied at several levels, the order of application must be randomized, rather than obtaining all the replicates at one treatment level, then at the next level, and so on. If there are several factors, then the combinations of treatments must be used in a random sequence. Lack of random sampling may result in the confounding of the analysis by an unrecognized experimental factor, as well as in biased estimates of variance components.


•

Table IV. Two-Way Anova of Dissolved Organic Matter in River Water (6)

Factor A: sampling stations

Factor 13: MW traction 1 2 3 4 5

A 1.65 1.18 0.41 0.30 0.25 Β 1.47 1.06 0.44 0.30 0.23 c 1.35 0.76 0.59 0.26 0.33 D 1.47 1.18 2.00 1.94 1.33

Ε 2.64 1.35 1.82 1.21 1.35

Anova table

Source of variation df ss MS Fs E(MS)

Among stations 4 3.1451 0.786 5.54** σ2 + 1.25 Σ Among MW 4 5.1644 1.291 9 .11* * * σ 2 + 1 . 2 5 Σ β2

fractions Residual error & 16 2.2687 0.1418 σ2 + 0.0625

interaction (αβ)ζ

Total 2 4 10.5782

T h e best way to check if this requi rement has been met is to scrutinize the experimental procedure for factors which might not have been varied in a random fashion. The only solution to lack of randomness is to redesign the experiment. This may involve a new sequence for the measurements , or it may mean increasing the number of factors to be tested, converting an unsuspected nonrandom effect into a t r ea tment effect factor.

Independence of Random Errors. The random errors must not, in fact, vary from measurement to measurement according to some pat tern. If the da ta are arranged in the sequence in which they were collected, the residual error, t, should vary randomly from one measurement to the next. A pattern of regular cycling or of drift from low to high or vice versa indicates the presence of a t ime-dependent factor (such as a t empera ture drift or change in line voltage) which was not recognized when the experiments were carried out. This may make the F tests in anova misleading, especially if the ampl i tude of the drift or shift is large. Lack of independence of random errors can be detected by run tests (13b, 16a). If the residual errors are not independent , then the alternatives are to improve the experiment to remove the source of dependence entirely ( thermostat the appara tus , use a regulated power supply) or to redesign the experiment so tha t the t rea tments of interest are applied randomly with regard to this factor (randomized blocks design).

Homogeneous Sample Variances . In a one-way anova, the variance s,2

within any group ί is an est imate of the residual error variance σ2. These i est imates are pooled in obtaining MSwlthjn in the anova procedure. If, however, the variances are not all estimates of the same population variance, then the pooling is invalid. The significance of an F test performed using this MSwjthin will then be different from the significance adopted in selecting the critical value of F. Lack of homogeneity of the sample variances is called heteroscedasticity.

The change in significance of the F test (change in the value of type I error, a) can be substantial . Brown and Forsythe (17), for example, report percentages of rejections of t rue hypotheses as high as seven times the nominal significance levels with a ninefold range in within-group variances. The largest deviations from the desired significance levels occur when the groups vary in size and especially when the group sizes and variances are directly or inversely correlated. Clearly one should keep the group sizes constant if heteroscedasticity is a possibility in an experiment.

A simple test of the uniformity of the variances is to calculate the ratio F m m of the largest within-group variance to the smallest one. This is then compared to the tabulated value of F«{a,n-i\- Clearly, if F m a x is not significant, then all the within-group variances are not significantly different from each other.

In many cases, there is good reason to expect the variances to be different. In counting experiments the variance is equal to the number of counts. Data collected over a wide range of total counts will often be heteroscedastic. Fortunately, in this case taking the square roots of the original data makes the variances homogeneous, allowing anova to be carried out.

Arithmetic transformations are fre-

quently a solution to this problem. The nature of the techniques or materials involved in the experiment should serve as a guide to the appropriate transformation. In some cases, the uniformity of the variances may be improved by altering the experimental method. If a wide range of concentrat ions mus t be measured, perhaps the samples could be diluted to different extents in the preparat ion for the measurement , so tha t the numerical values of the observations fall into a relatively narrow range.

It may, in fact, be best to use a non-parametr ic analysis in place of anova, as will be described later in this article.

Gaussian Distr ibut ion of Random Errors. Many data follow the


Table V. Effect of Dithionite Concentration on Determination of Nitrate ( 7 )

A. R a w a b s o r b a n c e da ta

Factor B: [ S 2 0 4

2 - ] . M

Factor A: [NO3 ], ppm Factor B:

[ S 2 0 42 - ] . M 0.25 0.90 2.25 4.50

0.023 0.026 0.104 0.261 0.496

0.046 0.029 0.110 0.264 0.495 0.092 0.020 0.100 0.240 0.456

Anova table Source of

B(MS) variation df ss MS Fs B(MS)

[NO3- ] 3 0.36399 0.12133 1 5 0 0 * * * σ2+Σα2

[S2042-] 2 0.000990 0.000495 6 . 1 * σ2 + 2Σβ2

Residual 6 0.000485 0.0000808 σ2 + 0.125 error and Σ (αβ)2

interaction Total 11 0.36547

B. 10 X a b s o r b a n c e / [ N O Γ] Factor B:

|S2042~J. M

Factor A : [NO3"""], ppm Factor B: |S204

2~J. M 0.25 0.90 2.25 4.50

0.023 1.16 1.22 1.17 1.10 0.046 1.04 1.16 1.16 1.10 0.092 0.80 1.11 1.07 1.01

Anova table Source of variation df SS MS Fs E(MS)

[NO3- ] 3 0.0473 0.01576 3.51 ns σ2 + Σα2

[s2o42- ] 2 0.0578 0.0289 6.45* σ2 + 2 Σ/32

Residual error and 6 0.0269 0.00448 σ2 + 0.125 Σ interaction (αβ)2

Total 11 0.1320

gaussian distribution sufficiently closely for anova to be feasible. The residual error is normally gaussian since it is the combination of many small sources of error. Thus, even though the individual sources of residual error may be nongaussian, their combination is likely to be gaussian (Central Limit theorem).

If the distribution of the random errors is significantly different from the gaussian, then the significance (a) and/or the power (β) of the F test may be altered. Usually this occurs when one or both tails of the experimental distribution are longer or shorter than the gaussian. This can be detected by calculating the skew (third moment) and kurtosis (fourth moment) of the data, provided that the set of data is large. The Kolmogorov-Smirnov test (13c, 16b) may be used to test sets of data containing as few as 15 values and allows testing the experimental distribution against any hypothetical distribution.

This simple test is especially useful in deciding whether to use a transformation of the experimental data, since it can compare the data with a log gaussian or other transformed gaussian distribution. The concentrations of trace constituents of a sample often

follow a log-gaussian distribution, and the logarithms of the measurements can then be used in anova.

Other solutions for the situation of nongaussian data are use of a larger sample size, use of a more robust statistic than F (18), or use of a nonpar-ametric technique of analysis.

Nonparametric Techniques If the assumptions of homogeneity

of variances and gaussian distribution of random errors cannot be met, then it may be best to use a nonparametric test. These tests are also known as rank-order and as distribution-free tests since they make use of the ranks of the items rather than their numerical values and therefore should be insensitive to the shape of the error distribution.

An additional advantage of many of these tests is that they are simple to carry out. The Wilcoxon paired comparison test described below can be completed in a few minutes by use of hand calculations. On the other hand, by only using the ranks of the data, information is lost and the analysis is less complete than with a parametric test. The latter is usually more

powerful (smaller type II error) than its nonparametric counterpart. Hence, it generally is best to use the parametric anova if the assumptions can be met, while the nonparametric test should be used if there is any serious question about the assumptions.

Many useful rank-order tests have been devised (16, 19), and it is only because of limitations of space that just two such methods will be described in this article.

Paired comparisons (two-treatment randomized blocks) are widely applicable in analytical chemistry. They are used for comparing new and reference methods of analysis (as in the calcium-ion electrode example) and for checking the identity of two materials. Often the data will cover a wide numerical range, leading to a significant range of variances in many cases. The Wilcoxon Signed-Ranks test (16c) is applicable to these situations. An example involving two methods for determination of specific surface area (20) is given in Table VI. The pairs of data are tabulated, and the differences between methods determined for each pair. The differences are ranked from the lowest (rank 1) to highest (rank n) without regard to sign. The ranks are then given + or — signs according to the sign of the difference. The positive and negative ranks are summed separately, and the lower of the sums is compared with the tabulated statistic w. Rejection of the hypothesis of no difference between the two groups occurs if ws is less than wn. Note that tied differences are given an average rank. In the example, one concludes that there is no significant difference between the two methods of measurement of surface area.

If there are several levels or kinds of treatments rather than just two, the randomized blocks method of Friedman (13d, 16d) may be used. An example from a study (3) of iron in atmospheric particulate matter as a function of wind velocity at various locations is given in Table VII. In this procedure the values are ranked within each block (location), and the ranks are then summed over all the blocks for each treatment. The squared rank sums are then inserted into the equation

X2=[12/aö(a+l)][f ;( i :«)2 l

- 3ό(α + 1)

where a is the number of different treatments, and b is the number of blocks. In this example X2 is 30. X2 is then compared with χ2 for α — 1 degrees of freedom. Since x'2o.oo5|4) = 14.9, the hypothesis of no treatment effect can be rejected with 99.5% confidence. The conclusion, then, is that


Table VI. Wilcoxon Signed-Ranks Test for Paired Comparison Determination of Surface Areas (24)

Methods

Difference Porosimetry Gas adsorption Difference Rank

74.1 70.0 +4.1 + 16 57.0 54.1 +2.9 + 15 17.2 18.4 -1 .2 - 1 3 14.0 14.4 -0 .4 -10.5 8.0 7.5 +0.5 + 12 0.48 0.52 -0 .04 -4 .5 0.13 0.15 -0.02 - 3 1.11 1.10 +0.01 +2 1.31 1.27 +0.04 +4.5 1.05 1.28 -0.23 - 7

26.4 26.8 -0 .4 -10.5 19.1 19.1 0 + / -0 .5 4.82 4.98 -0.16 - 6 5.98 5.70 +0.28 +8 9.79 9.50 +0.29 +9 7.21 5.62 + 1.59 + 14

Absolute sum of negative ranks 55 Sum of positive ranks

Ws = 55 ns

81

Wo.20[16] = 5 1 Wo 05(16| = 3 6 W0.01I16] • 24 (23)

Table VII . Fr iedman Rank-Sum Test: Effect of Wind Velocity on Iron Concentration Factor B:

wind Factor A: sampling locations

velocity, knots 1 2 3 4 5 6 7 8 9 10 11

<5 1.17 0.89 0.88 0.91 1.25 1.17 0.95 0.86 0.89 0.69 1.00 6 0.99 0.68 0.98 0.67 1.24 0.99 0.81 0.63 0.65 0.92 0.91

7-8 0.82 0.77 0.80 0.59 1.22 0.95 0.73 0.63 0.59 0.55 0.88 9-10 1.03 0.80 0.97 0.68 1.65 0.92 0.74 0.65 0.45 0.45 0.93 > 11 0.87 0.74 0.60 0.50 1.00 0.87 0.43 0.32 0.26 0.29 0.72

Ranks within columns Σ Β (Σ/?)2

5 5 3 5 4 5 5 5 5 4 5 51 2601 3 1 5 3 3 4 4 2.5 4 5 3 37.5 1406 1 3 2 2 2 3 2 2.5 3 3 2 25.5 650 4 4 4 4 5 2 3 4 2 2 4 38 1444 2 2 1 1 1 1 1 1 1 1 1 13 169

X2 = [12/ab(a + ' Ι ) ] [ Σ ( Σ « ) ] - 3 6 ( a - r - 1 ) = 30

Total 6270

X2o.oo5[4] = 14.9

there is a significant variation of the iron concentration with wind velocity.

Conclusions This article has described the meth

ods, advantages, and limitations of analysis of variance. Much has had to be left out, for example, the related pointwise variance analysis (21, 22). alternatives for the estimation of variation, such as the duplicate analysis procedures of Thompson and How-ar th (23), likewise could not be included. I t is nevertheless hoped t ha t the reader will have gained an idea of how anova can be applied to problems in analytical chemistry. Presently, the only areas of regular application are in interlaboratory testing of methods (24) and in testing for significance in regression analysis (13e). The author is convinced tha t a wider awareness of the purpose and value of anova will result in much more frequent use of this statistical technique.

Acknowledgment The author expresses his apprecia

tion to the following persons who provided encouragement, experimental data, and comments on the manuscript: T. F. Christiansen, L. J. Cline

Love, G. W. Ewing, L. N. Klat t , E. Krause, C. Orr, D. G. Pachuta , R. G. Smith, and especially W. J. Sullivan.

References (1) W. J. Youden, "Statistical Methods for

Chemists", Wiley, New York, N.Y., 1951. (2) D. G. Pachuta, PhD dissertation, Seton

Hall University, South Orange, N.J., 1976.

(3) W. J. Sullivan, ibid. (4) T. F. Christiansen, private communica

tion, 1976. (5) T. F. Christiansen, J. E. Busch, and S.

C. Krogh, Anal. Chem., 48,1051 (1976). (6) R. G. Smith, Jr., ibid., ρ 74. (7) D. R. Senn, P. W. Carr, and L. N.

Klatt, ibid., ρ 954. (8) L. N. Klatt, private communication,

1976. (9) F. H. Tingey, Treatise Anal. Chem.,

Parti, 10,6405(1972). (10) V. L. Anderson anel R. A. McLean,

"Design of Experiments", Marcel Dek-ker, New York, N.Y., 1974.

(11) O. L. Davies, Ed., "Design and Analysis of Industrial Experiments", 2nd ed., Hafner, New York, N.Y., 1971.

(12) W. G. Cochran and G. M. Cox, "Experimental Designs", 2nd ed., Wiley, New York, N.Y., 1957.

(13) R. R. Sokal and F. J. Rohlf, "Biometry", Freeman, San Francisco, Calif., 1969: a) Chaps. 11 and 12; b) pp 624ff; c) pp 571ff; d) pp 398-9; e) Chap. 14.

(14) F. J. Anscombe and J. W. Tukey, Technometrics, 5,141 (1963).

(15) P. I. Feder, ibid., 16,287(1974). (16) W. J. Conover, "Practical Nonparam-

etric Statistics", Wiley, New York, N.Y.,

1971: a) pp 349ff; b) Chap. 6; c) pp 203-15, 383; d) pp 264ff.

(17) M. B. Brown and A. B. Forsythe, Technometrics, 16, 129 (1974).

(18) J. N. Arvesen and Τ. Η. Schmitz, Biometrics, 26,677 (1970).

(19) M. Hollander and D. A. Wolfe, "Non-parametric Statistical Methods", Wiley, New York, N.Y., 1973.

(20) J. T. Brock and C. Orr, Anal. Chem., 44,1534(1972).

(21) L. Meites, Anal. Chim. Acta, 74, 177 (1975).

(22) A. F. Isbell, Jr., R. L. Pecsok, R. H. Davies, and J. H. Purnell, Anal. Chem., 45,2363(1973).

(23) M. Thompson and R. J. Howarth, Analyst, 101,690(1976).

(24) ASTM Standard Recommended Practices E-177 and E-180, ASTM, Philadelphia, Pa., 1976.

Bibliography R. R. Sokal and F. J. Rohlf, "Biometry",

Freeman, San Francisco, Calif., 1969 (particularly recommended for the clarity of the instructions for carrying out the various models of anova).

G. W. Snedecor and W. G. Cochran, "Statistical Methods", 6th ed., Iowa State Univ. Press, Ames, Iowa, 1967.

R.G.D. Steel and J. H. Torrie, "Principles and Procedures of Statistics", McGraw-Hill, New York, N.Y., 1960.

H. Scheffe, "The Analysis of Variance", Wiley, New York, N.Y., 1959.

R. A. Fisher, "Statistical Methods for Research Workers", 14th ed., Hafner, New York, N.Y., 1970.

O. L. Davies and P. L. Goldsmith, Eds., "Statistical Methods in Research and Production", 4th ed., Longman, London, England, 1976.

H. R. Lindman, "Analysis of Variance in Complex Experimental Designs", Freeman, San Francisco, Calif., 1974.

Roland F. Hirsch is an associate professor of chemistry at Seton Hall University. He was educated at Oberlin College and the University of Michigan. His interests in research and teaching include chromatography, ion-sensitive electrodes, statistics, and x-ray spectrometry. This article was conceived while he was on sabbatical leave at the Inorganic Chemistry Laboratory, Oxford University, where he was a beneficiary of the resources of the Radcliffe Science Library and Blackwell's Bookstores.


analysis of variance in analytical chemistry

Documents