lepage type statistic based on the modified baumgartner statistic

7
Computational Statistics & Data Analysis 51 (2007) 5061 – 5067 www.elsevier.com/locate/csda Lepage type statistic based on the modified Baumgartner statistic Hidetoshi Murakami Department of Mathematics, Graduate School of Science and Engineering, Chuo University, 1-13-27 Kasuga, Bunkyo-ku,Tokyo 112-8551, Japan Available online 12 May 2006 Abstract The Lepage-type statistic L M has been recently proposed. This is a combination of the Baumgartner statistic and the Ansari-Bradley statistic. The L M statistic is found to be more powerful than the Lepage statistic. A modified L M statistic is used for two-sample location and scale parameters. Furthermore, a modified Baumgartner statistic and the Mood statistic replace the Baumgartner and Ansari-Bradley statistics. Simulations are used to investigate the power of the Lepage-type statistics. © 2006 Elsevier B.V.All rights reserved. Keywords: Baumgartner statistic; Lepage statistic; Nonparametric test 1. Introduction We considered a two-sample problem, which is one of the most common types of statistical problems. Let X = (X 1 ,...,X n ) and Y = (Y 1 ,...,Y m ) be two random samples of size n and m independent observations, each of which had a continuous distribution described as F(x) and G(y), respectively. Let R 1 < ··· <R n and H 1 < ··· <H m denote the combined-samples ranks of the X-value and Y-value in increasing order of magnitude, respectively. Baumgartner et al. (1998) defined a nonparametric two-sample rank statistic B as follows: B = 1 2 (B X + B Y ), where B X = 1 n n i =1 (R i ((n + m)/n)i) 2 (i/(n + 1))(1 (i/(n + 1)))(m(n + m))/n and B Y = 1 m m j =1 ( H j ((m + n)/m)j ) 2 (j/(m + 1))(1 (j/(m + 1)))(n(m + n))/m . The power of the Baumgartner statistic is almost equivalent to that of the better-known tests ofWilcoxon (Hollander and Wolfe, 1999) that are used to location parameters. The aforementioned authors asserted that the Baumgartner statistic could be applied for location and scale parameters. They showed that the Baumgartner statistic was more powerful Tel.: +81 3 3817 1745; fax: +81 3 3817 1746. E-mail address: [email protected]. 0167-9473/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2006.04.026

Upload: hidetoshi-murakami

Post on 26-Jun-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Computational Statistics & Data Analysis 51 (2007) 5061–5067www.elsevier.com/locate/csda

Lepage type statistic based on the modified Baumgartner statisticHidetoshi Murakami∗

Department of Mathematics, Graduate School of Science and Engineering, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan

Available online 12 May 2006

Abstract

The Lepage-type statistic LM has been recently proposed. This is a combination of the Baumgartner statistic and theAnsari-Bradleystatistic. The LM statistic is found to be more powerful than the Lepage statistic. A modified LM statistic is used for two-samplelocation and scale parameters. Furthermore, a modified Baumgartner statistic and the Mood statistic replace the Baumgartner andAnsari-Bradley statistics. Simulations are used to investigate the power of the Lepage-type statistics.© 2006 Elsevier B.V. All rights reserved.

Keywords: Baumgartner statistic; Lepage statistic; Nonparametric test

1. Introduction

We considered a two-sample problem, which is one of the most common types of statistical problems. Let X =(X1, . . . , Xn) and Y = (Y1, . . . , Ym) be two random samples of size n and m independent observations, each of whichhad a continuous distribution described as F(x) and G(y), respectively. Let R1 < · · · < Rn and H1 < · · · < Hm denotethe combined-samples ranks of the X-value and Y-value in increasing order of magnitude, respectively. Baumgartneret al. (1998) defined a nonparametric two-sample rank statistic B as follows:

B = 12 (BX + BY ),

where

BX = 1

n

n∑i=1

(Ri − ((n + m)/n)i)2

(i/(n + 1))(1 − (i/(n + 1)))(m(n + m))/n

and

BY = 1

m

m∑j=1

(Hj − ((m + n)/m)j

)2

(j/(m + 1))(1 − (j/(m + 1)))(n(m + n))/m.

The power of the Baumgartner statistic is almost equivalent to that of the better-known tests of Wilcoxon (Hollander andWolfe, 1999) that are used to location parameters. The aforementioned authors asserted that the Baumgartner statisticcould be applied for location and scale parameters. They showed that the Baumgartner statistic was more powerful

∗ Tel.: +81 3 3817 1745; fax: +81 3 3817 1746.E-mail address: [email protected].

0167-9473/$ - see front matter © 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.csda.2006.04.026

5062 H. Murakami / Computational Statistics & Data Analysis 51 (2007) 5061–5067

than the Kolmogorov–Smirnov and Cramér–von Mises statistics for scale parameters. Recently, Neuhäuser (2003)investigated the Baumgartner statistic in the presence of ties. In addition, Neuhäuser (2001) investigated the behaviorof a modified Baumgartner statistic in a one-sided test. Since the k-sample problem is also a common statistical problem,Murakami (2006) developed a k-sample Baumgartner statistic based on another modified Baumgartner statistic, namelyB∗. The test statistic in Murakami (2006) was defined as follows:

B∗ = 12

(B∗

X + B∗Y

),

where

B∗X = 1

n

n∑i=1

(Ri − ((n + m + 1)/(n + 1))i)2

(i/(n + 1))(1 − (i/(n + 1)))(m(n + m + 1))/(n + 2)

and

B∗Y = 1

m

m∑j=1

(Hj − ((m + n + 1)/(m + 1))j

)2

(j/(m + 1))(1 − (j/(m + 1)))(n(m + n + 1))/(m + 2).

The B∗ statistic is defined using the exact mean and variance of Ri and Hj . The B∗ statistic is more efficient thanthe B statistic for location parameter when sample sizes are unequal. In many cases, we have to test the location andscale parameters at the same time. If the scale parameters change, the test statistic for the location parameter is notuseful. Similarly, if the location parameters change, the test statistic for the scale parameter is not useful. To resolvethe dilemma, Lepage (1971) developed the test statistic that combined the Wilcoxon and the Ansari-Bradley statistics(Hollander and Wolfe, 1999). In addition, Neuhäuser (2000) proposed a modified Lepage statistic, namely LM , whichwas combined with the Baumgartner and Ansari-Bradley statistics. In this paper, we propose a modification of the LM

statistic for use in two-sample problems. We replaced the Baumgartner and Ansari-Bradley statistics with a novel teststatistic that is a combination of the modified Baumgartner B∗ statistic and the Mood statistic (Gibbons, 1992). Pettitt(1976) first described the use of the Mood statistic in a Lepage-type test. Then, we computed the exact critical valuesof our novel Lepage-type statistic. Finally, we investigated the power of various Lepage-type statistics. To compare thepowers of the various statistics, we carried out simulation studies of various population distributions.All the simulationswere repeated 1 × 107 times. Since it is difficult to calculate critical values for large sample sizes, we used permutationto calculate exact critical values for small sample sizes.

2. A modification of Lepage type statistic

In this section, we suggest a modification of the Lepage-type statistic. The Lepage statistic is defined as

L =(

W − E0(W)√var0(W)

)2

+(

AB − E0(AB)√var0(AB)

)2

,

where

W =N∑

i=1

iV i and AB = 1

2n(N + 1) −

N∑i=1

∣∣∣∣i − N + 1

2

∣∣∣∣ Vi .

Let Vi = 1 if the ith smallest of the N = n + m observation is from X and Vi = 0 otherwise. E0(·) and var0(·) denotethe expected value and variance under the null hypothesis. Neuhäuser (2000) developed the test statistic LM to modifythe L statistic as follows:

LM =(

B − E0(B)√var0(B)

)2

+(

AB − E0(AB)√var0(AB)

)2

.

The results of Neuhäuser’s (2000) analysis revealed that the LM statistic was more powerful than the Lepage statisticfor location and scale parameters.

H. Murakami / Computational Statistics & Data Analysis 51 (2007) 5061–5067 5063

We propose the following modification of the LM statistic:

LB∗ = A′H−1A,

where

A =(

B∗ − E0 (B∗)M − E0(M)

),

H =(

var0 (B∗) cov0 (B∗, M)

cov0 (B∗, M) var0(M)

)

and

M =N∑

i=1

(i − N + 1

2

)2

Vi .

Let cov0(·) define the covariance under the null hypothesis. The exact critical value of B∗, LB∗ and LM statistics arelisted in Table 1. Neuhäuser (2000) calculated the critical values of the LM statistic for samples size that were equal,and Murakami (2006) computed some of the exact critical values of the B∗ statistic. We use permutation to calculatethe exact critical values of the LB∗ statistic for samples size that were unequal.

3. Simulation study

Next, we investigated the behavior of the LB∗ statistic. To compare the powers of the various Lepage statistics,we carried out a simulation study of different populations with various distributions. We tested the hypothesis H0 :F(x) = G(y) against H1 : F(x) = G((y − �)/�), � �= 0, � �= 1. We assumed that F(x) and G(y) described thefollowing distributions.

(1) N(�1, �

21

)and N

(�2, �

22

): the Normal distribution,

(2) � (�1) and � (�2) : the exponential distribution,(3) G

(�1, �1

)and G

(�2, �2

): the Gumbel distribution.

Generally, the location and scale parameters of the X and Y samples are unequal. We examined the power at which thelocation and scale parameters differed. The significance level was 5%. The following Tables show the results of powerof L, LM and LB∗ statistics in which case of n = m = 10 and n = 10, m = 5. Tables 2 and 3 list the results of thesimulation of the Normal distribution.

The power of the LB∗ statistic was similar to that of the L and LM statistics when n = m. When the location (but notthe scale) was shifted, the power of the LB∗ statistic was similar to that of the LM statistic. When the scale (but notthe location) was shifted, the power of the L statistic was greater than that of the LB∗ statistic, although the differencebetween these two statistics was small. When n �= m, the LB∗ statistic was more powerful than the LM statistic for thelocation, scale and location-scale parameters. The L statistic was more effective than the LB∗ statistic for the shiftedscale parameter. Furthermore, the LB∗ statistic was more efficient than the L statistic for location and location-scaleparameter shifts. Therefore, the LB∗ statistic is more suitable than the L and LM statistics for parameters associatedwith the Normal distribution.

We used an exponential distribution to simulate a typical asymmetrical distribution in Tables 4 and 5. Since the meanand variance of an exponential distribution depend on the value of lambda, the location and scale parameters differedbetween the two samples.

The results of our simulation revealed that when n = m, the LM and LB∗ statistics were more powerful than the Lstatistic. The LM and LB∗ statistics were very similar when the sample sizes were equal, while the power of the LB∗statistic was greater than that of the L and LM statistics when the sample sizes were unequal. These results indicate thatthe LB∗ statistic is more suitable than the L and LM statistics for parameters associated with an exponential distribution.

In Tables 6 and 7, we treat the Gumbel distribution which is the long-tailed distribution for right-side.The simulations of the Gumbel distribution revealed that the LB∗ statistic was more suitable than the L and LM

statistics when n=m. When the sample sizes were unequal, the power of the L statistic was greater than that of the LB∗

5064 H. Murakami / Computational Statistics & Data Analysis 51 (2007) 5061–5067

Table 1Exact critical value

n m Pr (B∗ �b∗) b∗ Pr (LB∗ � lB∗ ) lB∗ Pr (LM � lM) lM

5 5 0.0079 5.536 0.0079 27.792 0.0079 27.3650.0159 4.192 0.0159 13.763 0.0238 8.0300.0476 2.870 0.0476 6.397 0.0476 6.7740.0952 2.145 0.0952 4.675 0.0952 4.664

6 6 0.0087 4.739 0.0086 18.587 0.0087 18.3220.0195 3.792 0.0238 10.752 0.0238 10.6520.0433 2.873 0.0497 6.339 0.0498 6.2210.0996 2.075 0.0995 4.213 0.0996 4.021

7 7 0.0082 4.721 0.0093 18.888 0.0099 18.0470.0245 3.379 0.0245 9.095 0.0245 9.2890.0478 2.804 0.0495 6.522 0.0495 6.8390.0997 2.078 0.0991 4.289 0.0997 4.257

8 8 0.0098 4.421 0.0098 16.217 0.0099 15.8750.0241 3.486 0.0249 9.274 0.0249 9.4390.0499 2.723 0.0499 6.261 0.0499 6.4540.0999 2.087 0.0998 4.211 0.0999 4.396

9 9 0.0100 4.406 0.0100 15.785 0.0100 16.2070.0249 3.419 0.0249 9.539 0.0250 9.8550.0499 2.723 0.0499 6.403 0.0500 6.3880.1000 2.039 0.0999 4.178 0.1000 4.157

10 10 0.0100 4.359 0.0100 16.152 0.0100 16.5800.0249 3.396 0.0250 9.586 0.0250 9.7020.0500 2.700 0.0500 6.438 0.0500 6.5410.0999 2.040 0.1000 4.159 0.0100 4.111

7 6 0.0093 4.439 0.0093 15.981 0.0099 16.0760.0245 3.496 0.0239 9.268 0.0245 9.3930.0490 2.683 0.0495 6.115 0.0495 6.3540.0991 2.094 0.0997 4.275 0.0997 4.404

8 5 0.0093 4.468 0.0093 16.055 0.0093 16.3590.0249 3.487 0.0249 9.593 0.0249 9.8830.0497 2.716 0.0490 6.116 0.0497 6.1560.0995 2.048 0.0987 4.332 0.0995 4.311

9 7 0.0100 4.419 0.0100 5.871 0.0100 15.4950.0250 3.418 0.0250 9.341 0.0250 9.8560.0500 2.717 0.0498 6.315 0.0500 6.4030.1000 2.044 0.1000 4.203 0.1000 4.140

10 5 0.0100 4.432 0.0100 15.549 0.0100 16.3970.0246 3.496 0.0246 9.432 0.0250 9.1290.0500 2.703 0.0496 6.179 0.0500 6.2700.0999 2.076 0.0999 4.185 0.0999 4.173

statistic for a shifted scale parameter, although this difference was small. The LB∗ statistic was more powerful than theL statistic for the location parameter, and the LB∗ statistic was more powerful than the LM statistic for location, scaleand location-scale parameters. Therefore, the LB∗ statistic is more efficient than the L and LM statistics for parametersassociated with a Gumbel distribution.

4. Conclusion

In this paper, we presented a novel Lepage-type test statistic, LB∗ , that was combined with a modified Baumgartnerstatistic and the Mood statistic. We calculated the exact critical values of the B∗ and LB∗ statistics under the condition

H. Murakami / Computational Statistics & Data Analysis 51 (2007) 5061–5067 5065

Table 2Case of n = m = 10 for N(0, 1) and N

(�2,�

22

)�2 �2

1.0 2.0 3.0 4.0 5.0

0.0 L 0.049 0.243 0.514 0.696 0.804LM 0.050 0.207 0.457 0.643 0.760LB∗ 0.050 0.224 0.500 0.694 0.808

0.5 L 0.128 0.278 0.530 0.703 0.807LM 0.147 0.258 0.483 0.656 0.767LB∗ 0.147 0.267 0.518 0.701 0.811

1.0 L 0.404 0.388 0.576 0.723 0.816LM 0.460 0.396 0.546 0.686 0.782LB∗ 0.460 0.396 0.570 0.723 0.821

1.5 L 0.761 0.559 0.647 0.754 0.831LM 0.809 0.593 0.637 0.729 0.804LB∗ 0.809 0.587 0.650 0.757 0.836

2.0 L 0.952 0.743 0.732 0.793 0.850LM 0.968 0.785 0.740 0.780 0.831LB∗ 0.968 0.777 0.744 0.799 0.856

Table 3Case of n = 10, m = 5 for N(0, 1) and N

(�2,�

22

)�2 �2

1.0 2.0 3.0 4.0 5.0

0.0 L 0.048 0.109 0.235 0.350 0.440LM 0.050 0.081 0.170 0.261 0.341LB∗ 0.049 0.085 0.182 0.287 0.381

0.5 L 0.096 0.109 0.228 0.342 0.434LM 0.087 0.076 0.152 0.238 0.317LB∗ 0.110 0.105 0.191 0.291 0.383

1.0 L 0.265 0.126 0.214 0.324 0.417LM 0.259 0.113 0.148 0.221 0.294LB∗ 0.311 0.175 0.222 0.307 0.392

1.5 L 0.543 0.191 0.207 0.302 0.394LM 0.546 0.211 0.169 0.213 0.275LB∗ 0.609 0.311 0.281 0.336 0.407

2.0 L 0.801 0.321 0.225 0.285 0.369LM 0.808 0.376 0.225 0.221 0.264LB∗ 0.851 0.498 0.373 0.382 0.431

Table 4Case of n = m = 10 for �(1) and �

(�2

)�2

1.0 2.0 3.0 4.0 5.0

L 0.049 0.187 0.414 0.603 0.735LM 0.050 0.216 0.469 0.662 0.787LB∗ 0.050 0.212 0.460 0.652 0.777

5066 H. Murakami / Computational Statistics & Data Analysis 51 (2007) 5061–5067

Table 5Case of n = 10, m = 5 for �(1) and �

(�2

)�2

1.0 2.0 3.0 4.0 5.0

L 0.048 0.157 0.324 0.470 0.582LM 0.050 0.139 0.296 0.436 0.547LB∗ 0.049 0.164 0.331 0.473 0.582

Table 6Case of n = m = 10 for G(0, 1) and G

(�2,�2

)�2 �2

1.0 2.0 3.0 4.0 5.0

0.0 L 0.049 0.215 0.460 0.641 0.757LM 0.050 0.191 0.421 0.603 0.725LB∗ 0.050 0.202 0.451 0.642 0.764

0.5 L 0.117 0.233 0.464 0.642 0.758LM 0.129 0.236 0.445 0.617 0.735LB∗ 0.131 0.239 0.465 0.648 0.768

1.0 L 0.351 0.327 0.502 0.659 0.766LM 0.386 0.363 0.507 0.649 0.752LB∗ 0.393 0.360 0.517 0.672 0.779

1.5 L 0.674 0.496 0.575 0.692 0.783LM 0.711 0.555 0.603 0.698 0.779LB∗ 0.719 0.550 0.605 0.712 0.799

2.0 L 0.893 0.695 0.673 0.739 0.806LM 0.913 0.752 0.715 0.757 0.811LB∗ 0.917 0.749 0.713 0.764 0.825

Table 7Case of n = 10, n = 5 for G(0, 1) and G

(�2,�2

)�2 �2

1.0 2.0 3.0 4.0 5.0

0.0 L 0.048 0.183 0.346 0.473 0.565LM 0.050 0.165 0.316 0.440 0.533LB∗ 0.049 0.174 0.335 0.463 0.558

0.5 L 0.074 0.202 0.359 0.483 0.573LM 0.068 0.179 0.327 0.448 0.540LB∗ 0.088 0.198 0.348 0.472 0.564

1.0 L 0.168 0.269 0.398 0.506 0.588LM 0.176 0.244 0.363 0.471 0.554LB∗ 0.233 0.273 0.386 0.494 0.578

1.5 L 0.341 0.381 0.461 0.544 0.612LM 0.381 0.358 0.425 0.507 0.578LB∗ 0.474 0.399 0.451 0.530 0.600

2.0 L 0.557 0.524 0.543 0.592 0.643LM 0.623 0.509 0.508 0.555 0.608LB∗ 0.723 0.559 0.536 0.577 0.629

H. Murakami / Computational Statistics & Data Analysis 51 (2007) 5061–5067 5067

5�m�n�10. We also calculated the exact critical values of the LM statistic when the sample sizes were unequal.The results of our simulations of Normal, exponential, and Gumbel distributions revealed that the LB∗ statistic is morepowerful statistic than the L and LM statistics for location-scale parameters for these distributions when the samplesizes are unequal. In the future, it will be important to derive the asymptotic distribution of the LB∗ statistic. Ourpreliminary simulations have revealed that the critical values for the asymptotic distribution of the LB∗ statistic arePr (LB∗ �16.671) = 0.010, Pr (LB∗ �10.151) = 0.025, Pr (LB∗ �6.669) = 0.050, and Pr (LB∗ �4.175) = 0.100.

References

Baumgartner, W., Weiß, P., Schindler, H., 1998. A nonparametric test for the general two-sample problem. Biometrics 54, 1129–1135.Gibbons, J.D., 1992. Nonparametric Statistical Inference. third ed. Dekker, New York.Hollander, M., Wolfe, D.A., 1999. Nonparametric Statistical Methods. second ed. Wiley, New York.Lepage, Y., 1971. A combination of Wilcoxon’s and Ansari-Bradley’s statistics. Biometrika 58, 213–217.Murakami, H., 2006. A k-sample rank test based on modified Baumgartner statistic and its power comparison. J. Japanese Soc. Comput. Statist. 19.Neuhäuser, M., 2000. An exact two-sample test based on the Baumgartner–Weiss–Schindler statistic and a modification of Lepage’s test. Commun.

Statist. Theory Meth. 29, 67–78.Neuhäuser, M., 2001. One-sided two-sample and trend tests based on a modified Baumgartner–Weiss–Schindler statistic. J. Nonparametric statist

13, 729–739.Neuhäuser, M., 2003. A note on the exact test based on the Baumgartner–Weiß–Schindler statistic in a presence of ties. Comput. Statist. Data Anal.

42, 561–568.Pettitt, A.N., 1976. A two-sample Anderson–Darling rank statistic. Biometrika 63, 161–168.