a sharp upper bound for the expected number of false rejections

8
Statistics and Probability Letters 82 (2012) 1507–1514 Contents lists available at SciVerse ScienceDirect Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro A sharp upper bound for the expected number of false rejections Alexander Y. Gordon University of North Carolina at Charlotte, Department of Mathematics and Statistics, 9201 University City Blvd, Charlotte, NC 28223, United States article info Article history: Received 25 May 2011 Received in revised form 7 March 2012 Accepted 7 March 2012 Available online 14 March 2012 MSC: primary 62J15 secondary 62G10 Keywords: Multiple testing procedure Monotone procedure Per-family error rate Step-down procedure Step-up procedure abstract We consider the class of monotone multiple testing procedures (monotone MTPs). It includes, among others, traditional step-down (Holm type) and step-up (Ben- jamini–Hochberg type) MTPs, as well as their generalization – step-up-down procedures (Tamhane et al., 1998). Our main result – the All-or-Nothing Theorem – allows us to ex- plicitly calculate, for each MTP in those classes, its per-family error rate – the exact level at which the procedure controls the expected number of false rejections under general and unknown dependence structure of the individual tests. As an illustration, we show that, for any monotone step-down procedure (where the term ‘‘step-down’’ is understood in the most general sense), the ratio of its per-family error rate and its familywise error rate (the exact level at which the procedure controls the probability of one or more false rejections) does not exceed 4 if the denominator is less than 1. © 2012 Elsevier B.V. All rights reserved. 1. Introduction The traditional control of the familywise error rate (the probability of one of more false rejections, abbreviated as FWER) (Tukey, 1953) becomes impractical in those applications of multiple hypothesis testing where the number of hypotheses is large (e.g., in microarray data analysis). Tukey (1953) also introduced another measure of type I error occurrence—the per- family error rate (PFER), which equals the expected number of false rejections. In those testing situations, where thousands of hypotheses are tested simultaneously, the control of the FWER (that is, the requirement that with a probability close to 1 no true hypothesis be falsely rejected) is no longer desirable, because it severely reduces chances to detect false hypotheses. In these situations the PFER appears to be a natural ‘‘heir’’ of the traditional FWER. The present work is motivated by the following question: given a multiple testing procedure (MTP) M, what is the exact level at which it controls the PFER? This number is an important characteristic of the procedure’s safety against ‘‘false discoveries’’. The MTPs considered below are of the most common type: such a procedure uses as input the observed p-values p i associated with the hypotheses H i being tested, and its output is the list of rejected hypotheses (or equivalently, the list of indices i of the p-values declared M-significant ). We assume, furthermore, that the procedure M is symmetric (which p-values will be declared M-significant does not depend on the order in which they are listed), cutting (the M-significant Abbreviations: MTP, multiple testing procedure; TSD, threshold step-down; TSU, threshold step-up; TSUD, threshold step-up-down; PFER, per-family error rate; FWER, familywise error rate. Tel.: +1 704 687 4576; fax: +1 704 687 6415. E-mail addresses: [email protected], [email protected]. 0167-7152/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2012.03.008

Upload: alexander-y-gordon

Post on 28-Oct-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Statistics and Probability Letters 82 (2012) 1507–1514

Contents lists available at SciVerse ScienceDirect

Statistics and Probability Letters

journal homepage: www.elsevier.com/locate/stapro

A sharp upper bound for the expected number of false rejections

Alexander Y. Gordon ∗

University of North Carolina at Charlotte, Department of Mathematics and Statistics, 9201 University City Blvd, Charlotte, NC 28223, United States

a r t i c l e i n f o

Article history:Received 25 May 2011Received in revised form 7 March 2012Accepted 7 March 2012Available online 14 March 2012

MSC:primary 62J15secondary 62G10

Keywords:Multiple testing procedureMonotone procedurePer-family error rateStep-down procedureStep-up procedure

a b s t r a c t

We consider the class of monotone multiple testing procedures (monotone MTPs).It includes, among others, traditional step-down (Holm type) and step-up (Ben-jamini–Hochberg type) MTPs, as well as their generalization – step-up-down procedures(Tamhane et al., 1998). Our main result – the All-or-Nothing Theorem – allows us to ex-plicitly calculate, for each MTP in those classes, its per-family error rate – the exact level atwhich the procedure controls the expected number of false rejections under general andunknown dependence structure of the individual tests. As an illustration, we show that, forany monotone step-down procedure (where the term ‘‘step-down’’ is understood in themost general sense), the ratio of its per-family error rate and its familywise error rate (theexact level at which the procedure controls the probability of one or more false rejections)does not exceed 4 if the denominator is less than 1.

© 2012 Elsevier B.V. All rights reserved.

1. Introduction

The traditional control of the familywise error rate (the probability of one of more false rejections, abbreviated as FWER)(Tukey, 1953) becomes impractical in those applications of multiple hypothesis testing where the number of hypotheses islarge (e.g., in microarray data analysis). Tukey (1953) also introduced another measure of type I error occurrence—the per-family error rate (PFER), which equals the expected number of false rejections. In those testing situations, where thousandsof hypotheses are tested simultaneously, the control of the FWER (that is, the requirement that with a probability close to 1no true hypothesis be falsely rejected) is no longer desirable, because it severely reduces chances to detect false hypotheses.In these situations the PFER appears to be a natural ‘‘heir’’ of the traditional FWER.

The present work is motivated by the following question: given a multiple testing procedure (MTP) M, what is theexact level at which it controls the PFER? This number is an important characteristic of the procedure’s safety against ‘‘falsediscoveries’’.

The MTPs considered below are of the most common type: such a procedure uses as input the observed p-values piassociated with the hypotheses Hi being tested, and its output is the list of rejected hypotheses (or equivalently, the listof indices i of the p-values declared M-significant). We assume, furthermore, that the procedure M is symmetric (whichp-values will be declared M-significant does not depend on the order in which they are listed), cutting (the M-significant

Abbreviations: MTP, multiple testing procedure; TSD, threshold step-down; TSU, threshold step-up; TSUD, threshold step-up-down; PFER, per-familyerror rate; FWER, familywise error rate.∗ Tel.: +1 704 687 4576; fax: +1 704 687 6415.

E-mail addresses: [email protected], [email protected].

0167-7152/$ – see front matter© 2012 Elsevier B.V. All rights reserved.doi:10.1016/j.spl.2012.03.008

1508 A.Y. Gordon / Statistics and Probability Letters 82 (2012) 1507–1514

p-values, if any, are smaller than the M-insignificant ones, if any), andmonotone: reduction in some or all p-values can onlyincrease the number of rejections.

The object of our study is the exact level at which a given procedure M controls the PFER under a general and unknowndependence structure of the p-values. Roughly speaking, this is the expected number of false rejections (falsely rejected truehypotheses) for the least favorable joint distribution of the p-values.

The main result of this work is the All-or-Nothing Theorem, which states that, under the additional assumption thatall the hypotheses Hi are true, such a least favorable distribution can be found among those distributions that have thefollowing property: given a random vector with such distribution, the procedure almost surely rejects either all hypothesesor none. This result allows us to explicitly calculate the exact level of control of the PFER (both with and without the aboveassumption) for the commonly used classes of stepwise procedures.

The remaining part of the paper is organized as follows. In the following Section 2, we present the necessary definitionsandnotation thatwill be used throughout the paper. In Section 3,we formulate themain result – theAll-or-Nothing Theorem– and prove it. This result pertains to the weak control of the PFER (all hypotheses are supposed to be true). In Section 4, wederive a theorem pertaining to the strong control of the PFER (no restrictions on the joint distribution of the p-values). InSection 5,we apply the previously obtained results to stepwise procedures. In Section 6,we consider an illustrative example:a comparison of the exact levels at which a given MTP controls the PFER and the FWER. Obviously, their ratio is ≤m, wherem is the number of hypotheses being tested. The results of the present work and certain earlier results imply that for amonotone step-down procedure (where the term ‘‘step-down’’ is understood in the most general sense) this ratio, rathersurprisingly, does not exceed 4 if the denominator is less than 1.

2. Basic notions

2.1. Uninformed multiple testing procedures

A multiple testing procedure (MTP) is a decision rule that, based on randomly generated data, selects for rejection asubset of the given set of hypotheses about the probability distribution from which the data are drawn.

We assume that there are in total m hypotheses H1,H2, . . . ,Hm, and associated with them are p-values P1, P2, . . . , Pm.The p-value Pi is a random variable (determined by the data) such that:

(i) 0 ≤ Pi ≤ 1;(ii) if the hypothesis Hi is true, then

prPi ≤ x ≤ x for all x (0 ≤ x ≤ 1). (1)

The p-value Pi measures the strength of the evidence against the hypothesis Hi provided by the data: the smaller Pi, thestronger the evidence. For amore detailed discussion of p-values, see, for example, Lehmann and Romano (2005, pp. 63–64).

From now on we assume that the hypothesis Hi is true if and only if (1) holds.Amarginal-p-value-only-based, or uninformed, multiple testing procedure (in the sequel — justmultiple testing procedure,

or MTP) is a Borel measurable mapping M: Im → 2Nm from the unit cube Im = [0, 1]m to the set 2Nm of all subsets ofNm = 1, 2, . . . ,m. Applying M to a vector p = (p1, . . . , pm) of observed p-values (p-vector), we obtain a subset M(p) ofNm; the inclusion i ∈ M(p)means that, given p-values p1, . . . , pm, M rejects the hypothesis Hi, or equivalently, the ith p-valueis M-significant. Otherwise, the hypothesis Hi is retained (not rejected), and the ith p-value is M-insignificant.

A multiple testing procedure M is symmetric if for any p ∈ Im and any one-to-one mapping (permutation) σ :Nm → Nm,denoting by σ(p) such p′

∈ Im that p′

σ(i) = pi for all i, we have M(σ (p)) = σ(M(p)). This means that if we arbitrarilypermute the hypotheses (and their observed p-values), then the procedure M will reject the same hypotheses as before.Every vector p ∈ Im is a permutation of a (unique) vector t ∈ Simpm, where Simpm is the m-dimensional simplex

Simpm= t = (t1, . . . , tm) ∈ Rm: 0 ≤ t1 ≤ · · · ≤ tm ≤ 1;

therefore, a symmetric MTP is determined uniquely once it is defined on Simpm.A procedure M is cutting if, whenever i ∈ M(p) and j ∈ M(p), we have pi < pj, that is, for any p-vector its M-significant

components (if any) are smaller than its M-insignificant components (if any).All MTPs considered below are assumed to be symmetric and cutting. We denote the set of all such procedures by Procm,

wherem is the number of hypotheses being tested.

Remark 1. Note that if M ∈ Procm, p ∈ Im and pi = pj for some i, j (i = j), then either both pi and pj are M-significant orboth are M-insignificant.

Comparison of proceduresFollowing Liu (1996), we say that a multiple testing procedure M′ dominates a procedure M if for all p ∈ Im we have

M′(p) ⊃ M(p), i.e., M′ rejects all hypotheses Hi rejected by M (and maybe some others); in this case we write M′≽ M.

A.Y. Gordon / Statistics and Probability Letters 82 (2012) 1507–1514 1509

2.2. Monotonicity

For p, p′∈ Im we write p′

≤ p if p′

i ≤ pi for all i. If A is a finite set, we denote by |A| its cardinality.

Proposition 1. For M ∈ Procm, the following three statements are equivalent:

(i) if p, p′∈ Simpm and p′

≤ p, then |M(p′)| ≥ |M(p)|;(ii) if p, p′

∈ Im and p′≤ p, then |M(p′)| ≥ |M(p)|;

(iii) if p, p′∈ Im and p′

≤ p, then M(p′) ⊃ M(p).

We call a multiple testing procedure M monotone if it satisfies any (and hence all) of the above conditions (i)–(iii). Theweakest condition (i) is convenient for the verification of monotonicity, while (iii) is useful for its application.

The proof of Proposition 1 is postponed to the Appendix.

2.3. Stepwise procedures

We present a generalized version of the concepts of step-down and step-up MTPs.We call an MTP M a step-down procedure (Gordon, 2007a) if, whenever t, t′ ∈ Simpm, |M(t)| ≥ i and tj = tj′ for

1 ≤ j ≤ i, we also have |M(t′)| ≥ i. In other words, for any p-vector t ∈ Simpm, the procedure’s decision to reject or notreject hypothesis Hi depends only on the p-values tj, 1 ≤ j ≤ i.

For any u ∈ Simpm, we define the corresponding threshold step-down (TSD) procedure TSDu as follows. Given a p-vectort ∈ Simpm, its ith component ti is significant if and only if tj ≤ uj for all j, 1 ≤ j ≤ i. It is readily seen that any TSD procedureis a monotone step-down MTP.

We call an MTP M a step-up procedure (Gordon, 2007b) if, whenever t, t′ ∈ Simpm, |M(t)| ≥ i and tj = tj′ for i ≤ j ≤ m,we also have |M(t′)| ≥ i. In other words, for any p-vector t ∈ Simpm, the procedure’s decision to reject or not rejecthypothesis Hi depends only on the p-values tj, i ≤ j ≤ m.

For any u ∈ Simpm, we define the corresponding threshold step-up (TSU) procedure TSUu as follows: Given a p-vectort ∈ Simpm, its ith component ti is significant if and only if tj ≤ uj for some j, i ≤ j ≤ m. Again, any TSU procedure is amonotone step-up MTP.

The TSD and TSU procedures are special cases of what we call threshold step-up-down (TSUD) procedures. This class ofMTPs, introduced by Tamhane et al. (1998) under the name of generalized step-up-down procedures, interpolates betweenthe TSD and TSU classes.

A threshold step-up-down procedure M, denoted below as TSUDru, is defined by a vector u ∈ Simpm and an integer

r (1 ≤ r ≤ m). Given a p-vector t ∈ Simpm, it acts as follows. For i ≤ r, M declares ti significant if and only if tj ≤ ujfor at least one j such that i ≤ j ≤ r . For i ≥ r, M declares ti significant if and only if tj ≤ uj for all j such that r ≤ j ≤ i.

Clearly, TSUD1u = TSDu and TSUDm

u = TSUu. It is also obvious that a TSUD procedure is monotone.

2.4. The per-family error rate

The per-family error rate (PFER) (Tukey, 1953) is the expected number of false rejections.LetM be anMTP,P a probability distribution on the unit cube Im, andP = (P1, . . . , Pm) a randomvectorwith distribution

P . We put

PFER(M, P ) := E(|M(P) ∩ TP |),

where TP = i ∈ Nm: Hi is true for the distribution P . That is, PFER(M, P ) is the expected number of true hypothesesfalsely rejected by M given a random vector with distribution P . The procedure M is said to strongly control PFER at level γif PFER(M, P ) ≤ γ for any probability distribution P on the unit cube Im. The number

PFER(M) := supP

PFER(M, P )

is, therefore, the exact level at which M strongly controls the PFER.Furthermore, the procedure M weakly controls PFER at level γ if PFER(M, P ) ≤ γ for all probability distributions P on

the unit cube Im satisfying all the hypotheses Hi (i = 1, . . . ,m). Hence, the number

wPFER(M) := supP :TP =Nm

PFER(M, P )

is the exact level at which M weakly controls the PFER.

2.5. The generalized familywise error rate

The kth generalized familywise error rate (we denote it by FWERk) is the probability of k or more false rejections. Thisconcept was introduced by Victor (1982).

1510 A.Y. Gordon / Statistics and Probability Letters 82 (2012) 1507–1514

Let M be an MTP, P a probability distribution on the unit cube Im, and 1 ≤ k ≤ m. Let P = (P1, . . . , Pm) be a randomvector with distribution P . We put

FWERk(M, P ) := pr|M(P) ∩ TP | ≥ k,

which is the probability for the procedure M to falsely reject k or more true hypotheses given a random vector withdistribution P . The number

FWERk(M) := supP

FWERk(M, P )

is the exact level at which M strongly controls FWERk; the number

wFWERk(M) := supP :TP =Nm

FWERk(M, P )

is the exact level at which M weakly controls FWERk.If we put k = 1, then FWERk becomes the traditional familywise error rate (FWER) introduced by Tukey (1953). In this

case we will write FWER(M) and wFWER(M) instead of FWER1(M) and wFWER1(M), respectively.

3. The weak per-family error rate of a monotone procedure

The main result of the present work is the following statement.

Theorem 1 (All-or-Nothing Theorem). If M is a monotone multiple testing procedure, then

wPFER(M) = m · wFWERm(M). (2)

Proof. (A) Let P = (P1, . . . , Pm) be an Im-valued random vector whose distribution P satisfies all hypotheses Hi, so that

prPi ≤ x ≤ x, 0 ≤ x ≤ 1, (3)

for all i (1 ≤ i ≤ m). We will construct a new random vector Y such that its distribution Y still satisfies all hypotheses Hiand that, given Y, M always rejects either all hypotheses Hi or none. At the same time, the expected number of false rejectionswill remain the same. It will follow that

m · FWERm(M, Y) = PFER(M, P ).

Maximization of the right-hand side over the distributions P satisfying all hypotheses Hi will lead to the inequality

m · wFWERm(M) ≥ wPFER(M);

this will complete the proof, since the opposite inequality is obvious.(B) Let r be a discrete random variable uniformly distributed in Nm and independent of P. Define a new random vector

Q: if r ∈ M(P), we put

Qi =

Pi if i ∈ M(P);Pr if i ∈ M(P).

Otherwise,

Qi =

Pi if i ∈ M(P);Pr if i ∈ M(P).

Lemma 1. Assume M is a monotone MTP. Let p = (p1, . . . , pm) ∈ Im and J = M(p), L = Nm \ J .

(a) Suppose r ∈ J and q = (q1, . . . , qm) is obtained from p by replacing all components pi, i ∈ L, with pr . Then M(q) = Nm.(b) Suppose r ∈ L and q = (q1, . . . , qm) is obtained from p by replacing all components pi, i ∈ J , with pr . Then M(q) = ∅ (the

empty set).

Proof. Both statements of the lemma follow from the property (iii) of monotone MTPs (see Proposition 1) and Remark 1(Section 2.1).

It follows from Lemma 1 and the construction of the random vector Q that |M(Q)| always equals 0 orm.(C) Now we are going to show that

E(|M(Q)|) = E(|M(P)|). (4)

Define, for each J ⊂ Nm, an event

ΩJ := M(P) = J.

A.Y. Gordon / Statistics and Probability Letters 82 (2012) 1507–1514 1511

Let J := J ⊂ Nm: pr(ΩJ) > 0. By the law of total expectation,

E(|M(Q)|) =

J∈J

E(|M(Q)| | ΩJ) · pr(ΩJ)

and similarly

E(|M(P)|) =

J∈J

E(|M(P)| | ΩJ) · pr(ΩJ).

Hence, to prove (4) it suffices to show that

E(|M(Q)| | ΩJ) = E(|M(P)| | ΩJ) (5)

for all J ∈ J.Let J ∈ J. Assume the event ΩJ has occurred. Denoting |J| by k, we have by Lemma 1: if r ∈ J (the probability of this

event, given ΩJ , equals k/m), then |M(Q)| = m; otherwise |M(Q)| = 0. Hence

E(|M(Q)| | ΩJ) =km

· m +m − km

· 0 = k = E(|M(P)| | ΩJ),

which proves (5) and consequently (4).Since |M(Q)|, with probability 1, equals 0 orm, (4) implies that

m · pr(|M(Q)| = m) = E(|M(P)|). (6)

(D) The construction of the random vectorQ implies onemore property of its distribution thatwill be used below. Denoteby µi the distribution of Pi and by νi the distribution of Qi.

Lemma 2.mi=1

νi(∆) =

mi=1

µi(∆) for any interval ∆ ⊂ [0, 1]. (7)

Proof. Let J ∈ J. Denote by µJi the conditional distribution of Pi given ΩJ and by ν

Ji the conditional distribution of Qi given

ΩJ . Suppose i ∈ J . Given that ΩJ has occurred, Qi equals Pi if r ∈ J (the conditional probability of this event is k/m) or oneof Pl’s, l ∈ J , with conditional probability 1/m each. By the law of total probability, we have, for all i ∈ J and any interval∆ ⊂ [0, 1],

νJi (∆) =

km

µJi (∆) +

1m

l∈J

µJl (∆).

Similarly, if i ∈ J ,

νJi (∆) =

m − km

µJi (∆) +

1m

l∈J

µJl (∆).

Summation over all i ∈ Nm gives

mi=1

νJi (∆) =

mi=1

µJi (∆). (8)

Since, by the law of total probability,

νi(∆) =

J∈J

νJi (∆) pr(ΩJ)

and

µi(∆) =

J∈J

µJi (∆) pr(ΩJ),

multiplying both sides of (8) by pr(ΩJ) and summing over J ∈ J, we obtain (7).

(E) Let Y be a random vector obtained by randomly permuting components of Q:

Yi := Qσ(i), i = 1, . . . ,m,

where the random permutation σ of the numbers 1, . . . ,m is uniformly distributed among all such permutations and isindependent of Q.

1512 A.Y. Gordon / Statistics and Probability Letters 82 (2012) 1507–1514

The randomvectorYhas exchangeable (permutation invariant) distribution. SinceM is symmetric, we have pr(|M(Y)| =

m) = pr(|M(Q)| = m), so (6) implies

m · pr|M(Y)| = m = E(|M(P)|). (9)

For any interval ∆ ⊂ [0, 1], by the definition of Y, we have

prYi ∈ ∆ =1m

mi=1

prQi ∈ ∆,

or equivalently, in view of (7),

prYi ∈ ∆ =1m

mi=1

prPi ∈ ∆. (10)

This is true for any interval ∆ ⊂ [0, 1]; let ∆ = [0, x], where 0 ≤ x ≤ 1. Since the random vector P is assumed to satisfy allhypotheses Hi (see (3)), the right-hand side of (10) does not exceed x, so the random vector Y satisfies all hypotheses Hi aswell:

prYi ≤ x ≤ x, 0 ≤ x ≤ 1.

We have proved that for any Im-valued random vector P satisfying all hypotheses Hi there exists a random vector Y alsosatisfying them and, in addition, satisfying the equality (9). The latter can be rewritten in the form

m · FWERm(M, Y) = PFER(M, P ).

Taking the supremum over all distributions P on Im satisfying all hypotheses Hi, we obtain

m · wFWERm(M) ≥ wPFER(M).

The inverse inequality is obvious, and Theorem 1 is proved.

4. The strong per-family error rate of a monotone procedure

4.1. Truncated procedures

Let M ∈ Procm and let l be an integer, such that 0 ≤ l ≤ m − 1. We will define now a new procedure [M]l ∈ Procm−l —the l-truncation of M. Here is the definition:

The ith component pi (1 ≤ i ≤ m − l) of a vector p ∈ Im−l is [M]l-significant if and only if the (l + i)th component, equal topi, of the vector (0, . . . , 0, p1, . . . , pm−l) (l zeros) is M-significant.

Clearly, [M]l is symmetric and cutting (as is M). It is also obvious that [M]l is monotone if M is monotone.

Lemma 3. (a) [TSDu]l = TSD(ul+1,...,um);(b) [TSUu]l = TSU(ul+1,...,um);(c)

[TSUDru]l =

TSUDr−l

(ul+1,...,um) if l < r − 1;TSD(ul+1,...,um) if l ≥ r − 1.

4.2. The strong per-family error rate through truncation

Lemma 4. For a monotone procedure M ∈ Procm,

PFER(M) = max0≤l≤m−1

wPFER([M]l).

Proof. It follows from the definition of PFER(M) that

PFER(M) = max0≤l≤m−1

sup

P :|j:Hj is false |=lPFER(M, P )

≡ max

0≤l≤m−1Dl. (11)

Fix the number l of false hypotheses. Since M is symmetric, it suffices to consider in (11) only those distributions P forwhich the hypotheses H1, . . . ,Hl are false, while Hl+1, . . . ,Hm are true. In view of the property (iii) of monotone MTPs (seeProposition 1), PFER(M, P ) can only increase if the distribution P is changed by replacing P1, . . . , Pl with identical zeros.It follows that Dl equals wPFER([M]l).

A.Y. Gordon / Statistics and Probability Letters 82 (2012) 1507–1514 1513

Theorem 2. For a monotone procedure M,

PFER(M) = max0≤l≤m−1

[(m − l) · wFWERm−l([M]l)] .

Proof. The statement follows from Lemma 4, monotonicity of [M]l, and Theorem 1.

5. The per-family error rate of a stepwise procedure

Let u ∈ Simpm. By Theorem 1,

wPFER(TSDu) = m · wFWERm(TSDu) = m ·

min1≤j≤m

m

uj

j

∧ 1

= m · min

1≤j≤m

m

uj

j

,

the second equality being a special case of (Gordon, 2007a, Theorem 5.1).

Corollary 1. For any u ∈ Simpm,

(a) wPFER(TSDu) = m2· min1≤j≤m

uj

j

; (12)

(b) PFER(TSDu) = max0≤l≤m−1

(m − l)2 · min

1≤j≤m−l

ul+j

j

. (13)

Remark 2. The formulas (12) and (13) allow the following geometric interpretation. Let L be a line with a nonnegativeslope in the Cartesian x, y plane, intersecting the x-axis at a point (l, 0) with integer l (0 ≤ l < m) and such that all thepoints (i, ui), i = 1, . . . ,m, lie on or above L. This line, together with the lines y = 0 and x = m, makes a right triangle.The maximum of the double areas of all such triangles equals PFER(TSDu); if we fix l = 0, the restricted maximum equalswPFER(TSDu).

Remark 3. Here is an explicit construction of (some) least favorable distributions in the cases (a) and (b) of Corollary 1. Fixl (0 ≤ l ≤ m − 1) and put δ = min1≤j≤m−l

ul+jj

, a = (m − l)δ, b = 1 − a; also fix any ε ∈ (0, 1). Let X1, . . . , Xm

be U[0, 1] random variables, σ a uniformly distributed permutation of l + 1, . . . ,m, and ζ a Bernoulli random variabletaking values 0 and 1 with probabilities a and b, respectively, such that all Xi, σ and ζ are mutually independent. LetP = (P1, . . . , Pm), where Pi = (1 − ε)uiXi for i = 1, . . . , l, while for i = l + 1, . . . ,m we put Pi = σ(i)δ − Xiδ ifζ = 0, and Pi = a + bXi if ζ = 1. If we choose l = 0, the distribution P of the random vector P satisfies all hypothesesHi and is such that PFER(TSDu, P ) = wPFER(TSDu). If l is given the value for which the maximum in (13) is attained, thenPFER(TSDu, P ) = PFER(TSDu). Note that in both cases P has density, unless u1 = 0.

Corollary 2. wPFER(TSUu) = PFER(TSUu) = mum.

Remark 4. Let ζ be a Bernoulli random variable taking values 0 and 1 with probabilities um and 1 − um, respectively; let Pbe a random vector uniformly distributed in the cube [0, um]

m if ζ = 0, and in the cube (um, 1]m if ζ = 1. The distributionP of P satisfies all the hypotheses Hi, has density, and PFER(TSUu, P ) = mum.

Corollary 3. For any u ∈ Simpm and r ∈ 1, . . . ,m,

(a) wPFER(TSUDru) = m2

· minr≤i≤m

ui

i

;

(b) PFER(TSUDru) = max

0≤l≤m−1

(m − l)2 · min

1∨(r−l)≤j≤m−l

ul+j

j

.

Proof. Given t ∈ Simpm, we have |TSUDru(t)| = m if and only if |TSDur (t)| = m, where ur

= (ur , . . . , ur , ur+1, . . . , um) (uris repeated r times); therefore, part (a) follows from Corollary 1(a). Part (b) follows from Lemmas 4 and 3(c), Corollary 1(a),and part (a).

6. Example: PFER versus FWER for a step-down procedure

As a rather surprising illustration, we derive the following statement.

Corollary 4. Let M ∈ Procm be a monotone step-down procedure. If FWER(M) = α < 1, then PFER(M) ≤ (4 ∧ m) α.

Proof. It is obvious that PFER(M) ≤ m FWER(M). Furthermore, it follows from the conditions of the corollary, accordingto (Gordon and Salzman, 2008, Theorem 1), that M ≼ Holmα

m, where Holmαm = TSDu with u = ( α

m , αm−1 , . . . ,

α1 ). Therefore,

1514 A.Y. Gordon / Statistics and Probability Letters 82 (2012) 1507–1514

PFER(M) ≤ PFER(Holmαm). It remains to use Theorem2, the obvious equality [Holmα

m]l = Holmαm−l, and an equality obtained

in (Gordon, 2007a, Example 5.1): FWERk(Holmαn ) = Ckα (1 ≤ k ≤ n), where Ck < 4/k.

Remark 5. Nothing similar holds formonotone step-upprocedures. For them, the coefficientm in the inequality PFER(M) ≤

m FWER(M) cannot be improved.

Acknowledgment

I am grateful to the anonymous referee whose comments helped to improve the paper.

Appendix. Proof of Proposition 1

The implications (iii)H⇒ (ii)H⇒ (i) are obvious,while the implication (i)H⇒ (ii) is proved in (Gordon, 2007b). It remainsto show that (ii) implies (iii). We, therefore, assume that (ii) holds, that is,

If p, p′∈ Im and p′

≤ p, then |M(p′)| ≥ |M(p)|. (14)

We want to show that the inequality p′≤ p in fact implies the containment

M(p′) ⊃ M(p). (15)

For any p ∈ Im, let h(p) := maxpj: j ∈ M(p) be the largest M-significant component of the vector p. If M(p) = ∅, weput h(p) = −∞. Then we have

M(p) = j ∈ Nm: pj ≤ h(p)

(see Remark 1 in Section 2.1).It is sufficient to prove that the inequality p′

≤ p implies (15) in the case where the vectors p′ and p differ in a singlecomponent, say, the ith one: p′

i < pi, while p′

j = pj (j = i). We will show that (15) holds in each of the following threecases: (a) pi > p′

i ≥ h(p); (b) h(p) ≥ pi > p′

i; (c) pi > h(p) > p′

i . We may assume that M(p) = ∅, so the numberν(p) := |i ∈ Nm: pi = h(p)| is strictly positive.

Consider the case (a). Clearly, h(p′) ≥ h(p), since otherwise we would have |M(p′)| ≤ |M(p)| − ν(p) < |M(p)|, whichwould contradict (14). This proves (15) in the case (a).

In the case (b), we have either h(p′) ≥ h(p), which again implies (15), or h(p′) < h(p). If the latter occurs, thenM(p′) ⊂ M(p) (we use the fact that i ∈ M(p)), and hence M(p′) = M(p) — otherwise we would have |M(p′)| < |M(p)|,contrary to (14).

In the case (c), we change the ith component of the p-vector in two steps: first reduce it from pi to h(p), then from h(p)to p′

i . The first step is in the case (a); therefore, denoting the resulting vector by q, we have h(q) ≥ h(p) and M(q) ⊃ M(p).The former relation shows that the second step is in the case (b), so M(p′) ⊃ M(q).

References

Gordon, A.Y., 2007a. Explicit formulas for generalized family-wise error rates and unimprovable step-down multiple testing procedures. J. Statist. Plann.Inference 137, 3497–3512. http://dx.doi.org/10.1016/j.jspi.2007.03.027.

Gordon, A.Y., 2007b. Unimprovability of the Bonferroni procedure in the class of general step-up multiple testing procedures. Statist. Probab. Lett. 77,117–122. http://dx.doi.org/10.1016/j.spl.2006.07.001. Available online at: www.sciencedirect.com.

Gordon, A.Y., Salzman, P., 2008. Optimality of the Holm procedure among general step-down multiple testing procedures. Statist. Probab. Lett. 78,1878–1884. http://dx.doi.org/10.1016/j.spl.2008.01.055.

Lehmann, E.L., Romano, J.P., 2005. Testing Statistical Hypotheses, third ed. Springer, New York.Liu, W., 1996. Multiple tests of a non-hierarchical family of hypotheses. J. R. Stat. Soc. Ser. B 58, 455–461.Tamhane, A.C., Liu, W., Dunnett, C.W., 1998. A generalized step-up-down multiple test procedure. Canad. J. Statist. 26, 353–363.Tukey, J.W., 1953. The problem of multiple comparison. In: The Collected Works of John W. Tukey VIII, Multiple Comparisons: 1948–1983, Chapman and

Hall, New York, pp. 1–300. Unpublished Manuscript.Victor, N., 1982. Exploratory data analysis and clinical research. Methods Inf. Med. 21, 53–54.