# α-Investing: A Procedure for Sequential Control of Expected False Discoveries

Post on 31-Jan-2017

212 views

Embed Size (px)

TRANSCRIPT

<ul><li><p>-Investing: A Procedure for Sequential Control of Expected False DiscoveriesAuthor(s): Dean P. Foster and Robert A. StineSource: Journal of the Royal Statistical Society. Series B (Statistical Methodology), Vol. 70, No.2 (2008), pp. 429-444Published by: Wiley for the Royal Statistical SocietyStable URL: http://www.jstor.org/stable/20203833 .Accessed: 28/06/2014 19:22</p><p>Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp</p><p> .JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact support@jstor.org.</p><p> .</p><p>Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access toJournal of the Royal Statistical Society. Series B (Statistical Methodology).</p><p>http://www.jstor.org </p><p>This content downloaded from 193.142.30.98 on Sat, 28 Jun 2014 19:22:32 PMAll use subject to JSTOR Terms and Conditions</p><p>http://www.jstor.org/action/showPublisher?publisherCode=blackhttp://www.jstor.org/action/showPublisher?publisherCode=rsshttp://www.jstor.org/stable/20203833?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp</p></li><li><p>J. R. Statist. Soc. B (2008) 70, Part 2, pp. 429-444 </p><p>a-investing: a procedure for sequential control </p><p>of expected false discoveries </p><p>Dean P. Foster and Robert A. Stine </p><p>University of Pennsylvania, Philadelphia, USA </p><p>[Received April 2005. Final revision September 2007] </p><p>Summary, a-investing is an adaptive sequential methodology that encompasses a large fam </p><p>ily of procedures for testing multiple hypotheses. All control mFDR, which is the ratio of the </p><p>expected number of false rejections to the expected number of rejections. mFDR is a weaker criterion than the false discovery rate, which is the expected value of the ratio. We compen sate for this weakness by showing that a-investing controls mFDR at every rejected hypothesis, a-investing resembles a-spending that is used in sequential trials, but it has a key difference. </p><p>When a test rejects a null hypothesis, a-investing earns additional probability towards sub sequent tests, a-investing hence allows us to incorporate domain knowledge into the testing </p><p>procedure and to improve the power of the tests. In this way, a-investing enables the statistician </p><p>to design a testing procedure for a specific problem while guaranteeing control of mFDR. </p><p>Keywords: a-spending; Bonferroni method; False discovery rate; Familywise error rate; </p><p>Multiple comparisons </p><p>1. Introduction </p><p>We propose an adaptive sequential methodology for testing multiple hypotheses. Our approach, called a-investing, works in the usual setting in which one has a batch of several hypotheses as well as in cases in which hypotheses arrive sequentially in a stream. Streams of hypoth eses arise naturally in contemporary modelling applications such as genomics and variable </p><p>selection for large models. In contrast with the comparatively small problems that spawned </p><p>multiple-comparison procedures, modern applications can involve thousands of tests. For exam </p><p>ple, microarrays lead to comparisons of a control group with a treatment group by using measured differences on over 6000 genes (Dudoit et al9 2003). If we consider the possibil </p><p>ity for interactions, then the number of tests is virtually infinite. In contrast, the example that was used by Tukey to motivate multiple comparisons compares the means of only six </p><p>groups (Tukey (1953), available in Braun (1994)). Because a-investing tests hypotheses sequen </p><p>tially, the choice of future hypotheses can depend on the results of previous tests. Thus, </p><p>having discovered differences in certain genes, an investigator could, for example, direct atten </p><p>tion towards genes that share common transcription factor binding sites (Gupta and Ibra </p><p>him, 2005). Genovese et al (2006) described other testing situations that offer domain </p><p>knowledge. Before we describe a-investing, we introduce a variation on the marginal false discovery rate </p><p>mFDR, which is an existing criterion for multiple testing. Let the observable random variable R </p><p>denote the total number of hypotheses that are rejected by a testing procedure, and let V denote </p><p>Address for correspondence: Robert A. Stine, Department of Statistics, Wharton School of the University of </p><p>Pennsylvania, Philadelphia, PA 19104-6340, USA. </p><p>E-mail: stine@wharton.upenn.edu </p><p>? 2008 Royal Statistical Society 1369-7412/08/70429 </p><p>This content downloaded from 193.142.30.98 on Sat, 28 Jun 2014 19:22:32 PMAll use subject to JSTOR Terms and Conditions</p><p>http://www.jstor.org/page/info/about/policies/terms.jsp</p></li><li><p>430 D. P. Foster and R. A. Stine </p><p>the unobserved number of falsely rejected hypotheses. A testing procedure controls mFDR at level a if </p><p>E(V) mFDRi =--?</p></li><li><p>Sequential Control of Expected False Discoveries 431 </p><p>We follow the standard notation for labelling correct and incorrect rejections (Benjamini and Hochberg, 1995). Assume that mo of the null hypotheses in H(m) are true. The observable statistic R(m) counts how many of these m hypotheses are rejected. A superscript 6 distinguishes unobservable random variables from statistics such as R(m). The random variable Ve (m) denotes the number of false positive results among the m tests, counting cases in which the testing pro cedure incorrectly rejects a true null hypothesis. S6(m) </p><p>= R(m) ? </p><p>Ve(m) counts the number of </p><p>correctly rejected null hypotheses. Under the complete null hypothesis, mo = m9 Ve(m) = R(m) zadSe(m) = 0. </p><p>The original intent of multiple testing was to control the chance for any false rejection. The </p><p>familywise error rate FWER is the probability of falsely rejecting any null hypothesis from H(m): </p><p>FWER(m) = sup[Pe{Vd(m)^l}]. (2) </p><p>An important special case is control of FWER under the complete null hypothesis: Po{Ve(m) ^ </p><p>1} a, where Po denotes the probability measure under the complete null hypothesis. We refer to controlling FWER under the complete null hypothesis as controlling FWER in the weak sense. </p><p>Bonferroni procedures control FWER. Let p\9...9pm denote the/?-values of tests ofH\9..., Hm. Given a chosen level 0 < a < 1, the simplest Bonferroni procedure rejects those Hj for which </p><p>Pj < a/m. Let the indicators Vj {0,1} track incorrect rejections; Vj </p><p>is 1 if Hj is incorrectly rejected and is 0 otherwise. Then Ve(m) </p><p>= ? Vj </p><p>and the inequality </p><p>m </p><p>P0{Ve(m)^l}^J2P?M = l)^a (3) </p><p>7=1 </p><p>shows that this procedure controls FWER(m) ^ a. We need not distribute a equally over H(m); inequality (3) requires only that the sum of the a-levels does not exceed a. This observa tion suggests an a-spending characterization of the Bonferroni procedure. As an a-spend ing rule, the Bonferroni procedure allocates a over a collection of hypotheses, devoting a </p><p>larger share to hypotheses of greater interest. In effect, the procedure has a budget of a to </p><p>spend. It can spend c?j 0 on testing each hypothesis Hj so long as E7-o/ < a. Although such a-spending rules control FWER, they are often criticized for having little power. Clearly, the power of the traditional Bonferroni procedure decreases as m increases because the threshold </p><p>a/m for detecting a significant effect decreases. The testing procedure that was introduced in Holm (1979) offers more power while controlling FWER, but the improvements are small. </p><p>To obtain substantially more power, Benjamini and Hochberg (1995) introduced a different criterion, the false discovery rate FDR. FDR is the expected proportion of false positive results among rejected hypotheses, </p><p>R(m)>o\p{R(m)>0}. (4) </p><p>For the complete null hypothesis, FDR(m) = Po{R(m) > 0}, which is FWER(m). Thus, test </p><p>procedures that control FDR(m) ^ a also control FWER(m) in the weak sense at level a. If the complete null hypothesis is rejected, FDR introduces a different type of control. Under the alternative, FDR(m) decreases as the number of false null hypotheses m-mo increases (Dudoit et al.9 2003). As a result, FDR(m) becomes more easy to control in the presence of non-zero </p><p>effects, allowing more powerful procedures. Variations on FDR include pFDR (which drops the </p><p>FDR(m) / V\m) = Ed\~m) </p><p>This content downloaded from 193.142.30.98 on Sat, 28 Jun 2014 19:22:32 PMAll use subject to JSTOR Terms and Conditions</p><p>http://www.jstor.org/page/info/about/policies/terms.jsp</p></li><li><p>432 D. P Foster and R. A. Stine </p><p>term P(R > 0); see Storey (2002,2003)) and the local false discovery rate fdr(z) (which estimates </p><p>the false discovery rate as a function of the size of the test statistic; see Efron (2007)). Closer to </p><p>our work, Meinshausen and B?hlmann (2004) and Meinshausen and Rice (2006) estimated rao, the total number of false null hypotheses in H(m)9 and Genovese et al. (2006) weighted/7-values on the basis of prior knowledge that identifies hypotheses that are likely to be false. Benjamini and Hochberg (1995) also considered mFDRo and mFDRi, which they considered artificial. </p><p>mFDR does not control a property of the realized sequence of tests; instead it controls a ratio </p><p>of expectations. </p><p>Benjamini and Hochberg (1995) also introduced a step-down testing procedure that controls </p><p>FDR. Order the collection of m hypotheses so that the/?-values of the associated tests are sorted </p><p>from smallest to largest (putting the most significant first), </p><p>P(l) 0 denote the a-wealth that is accumulated by an investing rule after k tests; W(0) is </p><p>the initial a-wealth. Conventionally, W(0) = 0.05 or W(0) =0.10. At step7, an a-investing rule </p><p>sets the level a7 for testing Hj from 0 up to the maximum a-level that it can afford. The rule </p><p>This content downloaded from 193.142.30.98 on Sat, 28 Jun 2014 19:22:32 PMAll use subject to JSTOR Terms and Conditions</p><p>http://www.jstor.org/page/info/about/policies/terms.jsp</p></li><li><p>Sequential Control of Expected False Discoveries 433 </p><p>must ensure that its wealth never goes negative. Let Rj e {0,1} denote the outcome of testing </p><p>hypothesis Hf </p><p>R = J 1 ' if H j is rejected (pj < ay), ,~ 7 </p><p>\ 0, otherwise. </p><p>Using this notation, the investing rule J is a function of W(0) and the prior outcomes, </p><p>ay-=2"w(0)({?i,?2,...,?;-_i}). (7) </p><p>The outcome of testing H\9H29...9Hj determines the a-wealth W(j) that is available for </p><p>testing Hj+\. If pj ^ oij, the test rejects Hj and the investing rule earns a contribution to its </p><p>a-wealth, which is called the pay-out and is denoted by uj < 1. We typically set a = ou = W(0). </p><p>If pj > aj9 its a-wealth decreases by a;-/(l - </p><p>o?j)9 which is slightly more than the cost that is </p><p>extracted in a-spending. The change in the a-wealth is thus </p><p>W(j)-WU-l) = lU ... , *Pi%ai> (8) yjJ yj ' \ -aj/(\-aj) iipj>aj. w </p><p>If the /?-value is uniformly distributed on [0,1], then the expected change in the a-wealth is </p><p>-(1 - </p><p>uj)aj < 0. This suggests that a-wealth decreases when testing a true null hypothesis. Other </p><p>payment systems are possible; see the discussion in Section 7. </p><p>The notion of compensation for rejecting a hypothesis allows us to build context-dependent information into the testing procedure. Suppose that substantive insights suggest that the first </p><p>few hypotheses are likely to be rejected and that subsequent false hypotheses come in clusters. In </p><p>this instance, we might consider an a-investing rule that invests heavily at the start and after each </p><p>rejection, as illustrated by the following rule. Assume that the most recently rejected hypothesis is Hk*. (Set &* </p><p>= 0 when testing H\.) If false hypotheses are clustered, an a-investing rule should </p><p>invest heavily in the test of Hk*+\. One rule that does this is, for j>k*9 </p><p>W(i-l) </p><p>IW(o)({RuR2,.--,Rj-i})=l^._^. (9) </p><p>This rule invests \ </p><p>of its current wealth in testing H\ or Hy*+\. The a-level falls off quadratically if subsequent hypotheses are tested and not rejected. If the substantive insight is correct and </p><p>the false hypotheses are clustered, then tests of H\ or Hk*+\ represent 'good investments'. An </p><p>example in Section 6 illustrates these ideas. </p><p>Although it is relatively straightforward to devise investing rules, it may be difficult a priori to order the hypotheses in such a way that those which are most likely to be rejected come first. </p><p>Such an ordering relies on the specific situation. Another complication is the construction of </p><p>tests for which one can obtain the ^-values needed. To show that a testing procedure controls </p><p>mFDR, we require that, conditionally on the prior j ?\ outcomes, the level of the test of Hj must not exceed a?\ </p><p>E0(y?\Rj-i,Rj-2,...,Ri)</p></li><li><p>434 D. P. Foster and R. A. Stine </p><p>4. mFDR </p><p>The following definition generalizes the definition of mFDR that is given in condition (1). </p><p>Definition 1. Consider a procedure that tests hypotheses H\9H2,..., Hm. Then we define </p><p>Eo{Ve(m)} mFDR^ra) </p><p>= sup 9eS Ee{R(m)} + r] </p><p>(11) </p><p>A multiple-testing procedure controls mEDR^m) at level a if mFDR^ra) < a. We typically set rj = 1 </p><p>? a. Values of rj that are near 0 produce a less satisfactory criterion. Under the com </p><p>plete null hypothesis, no procedure can reduce mFDRo below 1 since Ve(m) = </p><p>R(m). Control </p><p>of mFDR^ provides control of FWER in the weak sense. Under the complete null hypothesis, </p><p>mFDR^ra) ^ a implies that </p><p>E9{Ve(m)}^-^-. 1 ?a </p><p>If rj = 1 ? </p><p>a, then Eo{Ve(m)} ^ a. Hence, control of mFDRi_a implies weak control of FWER </p><p>at level a. </p><p>The following simulation illustrates the similarity of mFDR^ and FDR. The tested hypoth eses Hj :/j,j </p><p>= 0 specify means of m = 200 populations. We set \ij by sampling a spike-and-slab </p><p>mixture. The mixture puts 100(1 ? </p><p>tt\)% of its probability in a spike at zero; tt\ = 0 identifies the </p><p>complete null hypothesis. The slab of this mixture is a normal distribution, so </p><p>Mr o </p><p>N(09 a2) </p><p>with probability 1 ? </p><p>n\, with probability tt\ . </p><p>(12) </p><p>In the simulation, 7ri ranges from 0 (the complete null hypothesis) to 1 (in which case Ve (m) = 0). We set a2 = 2 log(ra) so that the standard deviation of the non-zero pj matches the bound that is </p><p>commonly used in hard thresholding. The test statistics are independent, normally distributed </p><p>random variables Z7~IIDA^(/i;-, 1) for which the two-sided/?-values are pj </p><p>= 2$(?\Zj\). </p><p>Given theses-values, we computed FDR and mFDRo.95 with 10000 trials of four test pro cedures. Two procedures fix the level: one rejects Hj if pj < a </p><p>= 0.05 and the second rejects if </p><p>? o </p><p>? ,_ </p><p>0.005 0.02 0.08 </p><p>Proportion False (7^) </p><p>0.32 </p><p>-)fortheBH Fig. 1. FDR and mFDR^ provide similar control: simulated FDR (-) and mFDR0 95 ( step-down procedure ( ), oracle-based WBH (*), the Bonferroni method (o) and a procedure that tests each </p><p>hypothesis at level a = 0.05 (+) </p><p>This content downloaded from 193.142.30.98 on Sat, 28 Jun 2014 19:22:32 PMAll use subject to JSTOR Terms and Conditions</p><p>http://www.jstor.org/page/info/abou...</p></li></ul>

Recommended

View more >