tests of significance using nonnormal data

Research Notes and Comments

Tests of Significance Using Nonnormal Data, b y J . K . Ord

1. INTRODUCTION

Welch (1937) and Pitman (1937 a, b, c) arrived independently at hypothesis tests whose frame of reference was not a particular distribution, but a set of random permutations of the sample values themselves. These permutations tests provide a useful alternative to standard normal theory, although doubts have been expressed as to whether any set of random permutations can be an appropriate reference frame for inference.

The main purpose of this article is to show that, for any distribution, we may interpret the expected value of a function based on the set of random permutations as an unbiased estimator for the expected value of that function taken over the whole population; provided, of course, that the expectation exists. This result frees various hypothesis testing procedures from their dependence on the set of random permutations and enables approximate sampling distributions to be developed for nonnormal populations.

In section 2, the main result is presented and proved. In section 3, the method is applied to tests for autocorrelation (in time and in space), for correlation, and to the analysis of randomized block experiments. In general, these tests are not most powerful in any sense, but they are robust in that the estimated moments of the statistic are always unbiased.

2. PERMUTATION-SYMMETRIC DISTRIBUTIONS

Let P be a set of permutations of the n subscripts (1, 2, . . ., n), not necessar- ily containing all n! possibilities. We shall use P and 9 as subscripts to denote orderings corresponding to particular elements of P; that is, p , q E P .

Let X denote an n-vector random variable with joint density functionflx), and let X(p) be the vector after reordering with respect to permutation p , yielding density function fix,). If

fl%) = Ax3 (1)

for all values of x and for all p, 9 E P, we say that X has a permutation-symmetric distribution with respect to the set of permutations P. In passing, we note that when the random variables are independent and identically distributed, they are permutation-symmetric with respect to any class P. However, independence is not a necessary assumption, while marginal distributions need only be identical within certain subclasses determined by P. Having introduced this notion we now state the main result.

It is apparent that the development is similar in spirit to that of k-statistics (cf. Kendall and Stuart 1977, chap. 12). However, the evaluation of the expection of a function by k-statistics requires the function, rather than the distribution, to have certain symmetry properties.

J . K . Ord is reader of statistics, University of Warwick.

OO16-7363/80/1080-0387$00.50/0 GEOGRAPHICAL ANALYSIS, vol. 12, no. 4 (October 1980)

0 1980 Ohio State University Press

388 I Geographical Analysis

THEOREM. Let X be an n-vector random variable whose distribution i s permutation-symmetric with respect to P and let g ( X ) be any function of X whose expectation exists. Then

where E R denotes the expectation of g ( X ) taken over all the permutations contained in P, treated as equally likely (the randomization distribution)

Proof. (This is developed in terms of continuous variates, but the result may be proved for the discrete case in an analogous manner.) By definition,

If P contains M distinct permutations, taken as equally likely,

where g(x(,)) is the value of the function for the pth permutation of the set of values x. Then

since from equation (I), f(x) = E [g (X)] , the result follows.

for all p E P. As each integral is equal to

COROLLARY. Let g(X) = h(X) s ( X ) where s ( x ) is invariant under the class of permutations P; that is,

Then

E[g(X)I = E{s(X) ER[h(X)I) * Proof. Follows directly by the argument given above.

Comment. From the theorem it is clear that ER[g(X)] is an unbiased estimator for E[g(X)] . Thus, statistical techniques that use randomization arguments may be justified as procedures that generate unbiased estimators for the unknown E[g(X)]. In particular, the randomization tests developed by Welch (1937) and by Pitman (1937a, b, c) may be reinterpreted as methods for nonnormal distributions where the unknown moments are replaced by unbiased estimates. The technical questions that remain before inferences can be made then reduce to

Research Notes and Comments / 389

finding suitable approximations for the sampling distributions, which, in essence, is what Welch and Pitman did.

3. APPLICATIONS OF THE METHOD

In this section we examine several test statistics that have been considered using randomization arguments and summarize how they may be used to examine nonnormal data without recourse to a test based upon randomization.

A . Test for Autocorrelation in Time-Series Under Ho: independence, XI, . . ., X, are independent and identically dis-

tributed. To test the alternative of first order autocorrelation we may use the statistic

n n I = 2 (Xt-X) (Xt-l-X) / 2 (Xt-X)2

t = 2 t=l

If we let zt = xt - X and let P denote the set of all n! possible permutations it follows from the corollary that

since E R @ztzt-l) = [(ZZ,)~ - Zz,"] 1 [n(n- l)] = -22: / [n(n- l)]; see, for example, ClifF and Ord (1973, pp. 32-33). Hence E(Z) = -l/n. In like manner, ifZz1 = n?, we find that

where It-ul > 1. Using standard results (cf. ClifF and Ord 1973, p. 33) it follows that

var(Z) = [n2- b2(n + 1) - (n- l)] 1 [n2(n- l)],

where bz = rndw', provided that E(bz) exists. Z is asymptotically normally distributed even when the data are not normal (Moran 1967). Thus, for n not too small we may approximate the distribution of Z by the normal with mean - l/n and variance estimated by (2) with bz in place of E(b2). For the normal, E(b2) = 3(n- 1) / (n+ 1) and substitution of this value yields var ( I ) = (n -2)2 / [n2(n - l)] as it should.

B . Test for Spatial Autocorrelation (ClifF and Ord 1973, pp. 7-11)

Under Ho: independence, X I , . . . , X, are independent and identically distributed. To test the alternative of spatial autocorrelation with covariance matrix (cf. Besag 1974)


var(X) = uz (I - pW)-',

we use the statistic

I = (n/So) ZTW Z/zTZ,

where Zi = X i - x and So = Z i Zj wV . Following a similar line of argument to (A), Cliff and Ord (pp. 32-33) show that

and

2 - n [(n2-3n+3)S1-nSz+3S,] -bz [(nz-n)S1-2nS2+6S,] E R ( I ) -

(n-1) (n-2) (n-3)S;

where S1 = [(1/2) Z i Zj (wi. + toji)'], Sz = Z i (wio + tooi)', wio = Zj wV, and P again contains all n! possibfe permutations.

It follows that E ( I ) = - l/(n - 1) while E(Z2) is given by (3) on replacing b2 by E(bZ).l Sen (1976) has shown that the distribution of I is asymptotically normal when the X are nonnormal, given certain mild conditions on the structure of W. Thus, a nonrandomized test may be performed using the normal approximation, provided that n is not too small. The corrections to the approximation suggested by Cliff and Ord (1973 pp. 3740) are still valid.

C . Test for Correlation Given n pairs of variates (Xl,YJ, . . . , (X,,Y,) that are identically distributed,

we may test the null hypothesis of independence against the alternative of linear correlation using the usual sample correlation coefficient, which we denote by r .

If P contains the (n!)z permutations of all X and all Y taken separately we find, under H o (cf. Kendall and Stuart 1973, pp. 492-93)

and [(n+l)bXz-3 (n-I)] [(n+l) bY2-3(n-1)]

3n (n-2) (n-3) E R ( ~ ~ ) = 3/(n2-1) {l +

312 where b,l = mr3 I m, , b,. = mX4 I m22, nm,. = Ci (xi -3, and the terms for y are defined similarly. For the normal, E(b,J = 0, E(bXz) = 3(n- 1) I (n+ 1) so we reach

E(?) = 0 and E(r4) = 3 I (n2- 1)

as may be verified directly from the beta distribution for r. When the ratios bl and b2 have finite expectations, it can be seen that E(r3) is O(n-2) while

'E(Z2) always exists since 1 5 b2 5 n for any set of n observations.

Research Notes and Comments I 391

E(r4) / [E(?)I2 = 3 + O(n-'), confirming the closeness of the distribution to the beta for the nonnormal case. Hoeffding (1952) demonstrated the convergence of the permutations distribution and the normal case to a common limiting normal curve, and we now see that this applies for the distribution of r from any nonnor- ma1 distribution. (We note that.b2 5 n for any distribution and that E(rj) must exist for all j.)

D. Analysis of Randomized Block Experiment

Consider an experiment with b blocks, t treatments, n = bt with variates X,, which are independent and have equal variances and observed values xu, i = 1, . . . , b, j = 1, . . . , t . If 2, = X, - Xi, we may use the ratio of the treatment sum of squares to the (total-blocks) sum of squares,

to test the null hypothesis that the treatment effects are zero. Welch (1937) and Pitman (1937~) showed that, when the (b!)t possible permutations are considered equally likely under Ho,

and

uarR(U) = 2(1-A) / [b2(t-1)] ,

where A = Ci (Cj ~ , " j ) ~ / (Xi Cj z F ~ ) ~ . When the X, are normally distributed, A is a sum of related beta variables with parameters (Yz)(s - 1) and (?h)(n - 1) (s - I) so that

E(A) = ( t + 1) / (bt - b + 2),

whence

var(U) = 2(b - 1) / [b2(bt - b + 2)].

As expected, this agrees with the exact value for the normal case given by the beta [(Y2)(t - I), (Y2)(b - l)(t - I)] distribution.

For nonnormal distributions with E(X4) I [E(X2)12 = p2 < m, when b and t are large we obtain

var(U) 2b I [b2(bt - b + pz)].

From (4), it can be seen that the effect of nonnormality is slight, as it would be expected from Box (1953), who showed the analysis of variance to be robust to departures from normality. Pitman (1937~) gives the third and fourth moments for U under random permutations and shows by numerical studies that the normal and randomized distributions are very close unless b and t are small. Welch (1937) extends the approach to Latin squares; in principle, more complex designs may be investi- gated by this approach.


4. CONCLUSIONS

The result given in section 2 shows that we may interpret randomized expectations as unbiased estimators of parametric functions for (possibly) nonnormal populations. Thus, permutations tests may be viewed as general nonnormal procedures without any need to justify them in terms of the randomization. Several standard permutations tests have been reformulated in light of this finding.

LITERATURE CITED

Besag, J. E. (1974). “Spatial Interaction and the Statistical Analysis of Lattice Systems.” J. Royal

Box, G. E. P. (1953). “Non-normality and Tests on Variances.” Biometrika, 40, 318-35. Cliff, A. D., and J. K. Ord, (1973). Spatial Autocorrelation. London: Pion. Hoeffding, W. (1952). “The Large Sample Power of Tests Based Upon Permutations of the Observa-

tions.” Annals of Mathematical Statistics, 23, 169-92. Kendall, M. G., and A. Stuart, (1977, 1973). The Aduanced Theory of Statistics, Vols. 1 & 2. London:

Griffin. Moran, P. A. P. (1967). “Testing for Serial Correlation with Exponentially Distributed Variates.”

Biometrika, 54, 395-401. Pitman, E. J. G. (1937a, b, c). “Significance Tests which may be Applied to Samples from any Population

I, 11,111. ’1. Royal Statistical Society, Supplement, 4, 11930,225-32, and Biometdka, 29,32245. Sen, A. (1976). “Large Sam le Size Distribution of Statistics Used in Testing for Spatial Autocorrela-

tion.” Geographical Anaisis, 8, 175-84. Welch, B. L. (1937). “On the z-test in Randomized Blocks and Latin Squares.” Biometdka, 29, 21-52.

Statistical Society, series B, 20, 192236.

Model Selection in Analyzing Spatial Groups in Regression Analysis, by Andrew J . Buck and Simon Hakim”

Introduction In the statistical modeling of hypothesized spatial behavior or phenomenon,

reliance is often placed on samples drawn from a population thought to be homogeneous for certain characteristics. However, the population and the sample may have spatial subgroups that exhibit different behavior. For example, the deterrent effect of the arrest rate on criminals committing burglaries might differ between urban, suburban, and rural communities. The analysis of such intrasam- ple differences requires procedures for both model selection and hypothesis testing.

In selecting a model we wish to determine whether or not the sample subgroups are similar enough to warrant pooling the data, thus reducing the number of model parameters to be estimated from a given number of observations. If, on the basis of an appropriate decision rule, it is determined that the data should

*The authors wish to express their appreciation to two anonymous referees for their helpful comments and suggestions.

Andrew J. Buck and Simon Hakim are assistant professors of economics, Temple University.

0016-7363/80/1080-0392$00.50/0 GEOGRAPHICAL ANALYSIS, vol. 12, no. 4 (October 1980)

0 1980 Ohio State University Press

tests of significance using nonnormal data

Documents