controlling the actual number of false discoveries at a given confidence level
DESCRIPTION
Controlling the Actual Number of False Discoveries at a Given Confidence Level. Joe Maisog BIST-530 Final Project December 3, 2008. False Discovery Rate. FDR (FPR) = proportion of positive tests which are actually false positives FDR methods control the FDR in the sense that - PowerPoint PPT PresentationTRANSCRIPT
Controlling the Actual Number of False Discoveries
at a Given Confidence Level
Joe Maisog
BIST-530 Final Project
December 3, 2008
False Discovery Rate• FDR (FPR) = proportion of positive tests
which are actually false positives
• FDR methods control the FDR in the sense that
E{FDR} q
where q [0,1] is the desired level of control
Benjamini and Hochberg, 1995
Korn’s Variants
Korn E et al., J of Statistical Planning and Inference 124(2): 379-98 (2004).
Follow-Up Paper by Lusa et al.
• Lusa L, Korn EL, McShane LM, A class comparison method with filtering-enhanced variable selection for high-dimensional data sets, Stat Med. 2008 Dec 10;27(28):5834-49.
• C code (R package)
A Problem“Procedures targeting control of the expected number or proportion of false discoveries rather than the actual number or proportion can give a false sense of security. … Even with no correlation the results here [using “regular” FDR with simulated data] are troubling: 10% of the time the false discovery proportion will be 0.29 or more.” [emphasis mine]
Analogy: Accuracy vs. Precision
High AccuracyLow Precision
High PrecisionLow Accuracy
FDR
http://en.wikipedia.org/wiki/Accuracy
Two Jokes: Controlling ExpectationWithout a Confidence Level
• Three statisticians went out hunting, and came across a large deer. The first statistician fired, but missed, by a meter to the left. The second statistician fired, but also missed, by a meter to the right.The third statistician didn't fire, but shouted in triumph, "On the average we got it!"
• With one foot in a bucket of ice water, and one foot in a bucket of boiling water, you are, on the average, comfortable.
http://www.workjoke.com/statisticians-jokes.html
Korn’s Solution
“[Procedures targeting control of the actual number or proportion of false discoveries] will allow statements such as ‘with 95% confidence, the number of false discoveries does not exceed 2’ or ‘with approximate 95% confidence, the proportion of false discoveries does not exceed 0.01.’ ”[emphasis mine]
Korn’s Variants
Adjusted
p-Values
Actual number of false discoveries (“A”)
Actual proportion of false discoveries (“B”)
Full Algorithm
Computationally Efficient Algorithm
Unadjusted
p-Values
Actual number of false discoveries (“A”)
Actual proportion of false discoveries (“B”)
Full Algorithm
Computationally Efficient Algorithm
Two Goals
1. Confirm Korn’s warning that when using “regular” FDR, a fairly large fraction of false positive rates exceed the expected rate.
2. Implement in R Korn’s method to control the actual number of false positives at a given confidence level, using the computationally efficient version.
Definition
• k variables (e.g., genes)
• P(1) < P(2) < . . . < P(k) are the ordered p-values from
the univariate tests
• H(1), H(2), . . . , H(k) are the corresponding null
hypotheses
• T = { t1, t2, . . . , tj } is any subset of K = { 1, 2, . . . ,
k }
• Pr00 is the multivariate permutation distribution of p-
values
Definition
Procedure To Control the Actual Number of False Discoveries
1000 Simulations in R
• 50 controls, 50 treatments,1000 genes
• Noise ~ N(0,1), no cross-gene correlations
• 100 genes “activated” in treatments with increase = 0.3969 ( p = 0.05)
• “Regular” FDR method to control E{FDR} at q = 0.05
• Korn’s method to control the number of actual FP’s at u = 50, with 95% confidence
Simulated Data Matrix
p-values
N1
= 5
0N
2 =
50
G1 =100 G2 = 900
k = 1000
Nto
t =
100
Results: “Regular” FDR
• Mean FPR = 0.0394 (so, controlled at q = 0.05)• But 17.5% of the time, FPR > 0.05
Results: Korn’s Method
• 98.9% of the time, the actual number of false positives was 50
• Controlled at u = 50 with 95% confidence
Conclusions
• 17.5% of the time, FPR > q = 0.05 with “regular” FDR
• Korn’s method controlled actual number of false positives at u = 50 with 95% confidence (actually slightly conservative)
• Disadvantage: computationally intensive• Examining someone else’s computer
program can be difficult but very rewarding!
Future Directions
• Try different parameters (e.g., signal size; number of subjects, variables, or permutations), or with correlated variables
• Try the method on real data
• Try Korn’s “Procedure B”, which controls the actual FDR at a given confidence level
• Try Lusa’s R package for feature selection
References• Benjamini, Y., and Hochberg, Y. 1995. Controlling the false
discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57: 289–300.
• Korn EL, Troendle JF, McShane LM and Simon R. Controlling the number of false discoveries: application to high-dimensional genomic data. Journal of Statistical Planning and Inference 124(2): 379-398 (2004).
• Lusa L, Korn EL, McShane LM, A class comparison method with filtering-enhanced variable selection for high-dimensional data sets, Stat Med. 2008 Dec 10;27(28):5834-49. R package available at: http://linus.nci.nih.gov/Data/LusaL/bioinfo/
• Westfall PF, Tobias RD, Rom D, Wolfinger RD, Hochberg Y, Multiple Comparisons and Multiple Tests, Crary, NC:SAS Institute, Inc, 1999.
• A copy of the R code developed for this project can be found here:http://bist.pbwiki.com/f/bist530FinalProject.r