alex lewin centre for biostatistics imperial college, london
DESCRIPTION
Mixture models for classifying differentially expressed genes. Alex Lewin Centre for Biostatistics Imperial College, London. Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant. Modelling differential expression. - PowerPoint PPT PresentationTRANSCRIPT
1
Alex LewinCentre for Biostatistics
Imperial College, London
Joint work with Natalia Bochkina, Sylvia Richardson
BBSRC Exploiting Genomics grant
Mixture models for classifying differentially expressed genes
2
Modelling differential expression
• Many different methods/models for differential expression– t-test – t-test with stabilised variances (EB)– Bayesian hierarchical models– mixture models
• Choice whether to model alternative hypothesis or not
• Our model: – Model the alternative hypothesis – Fully Bayesian
3
• Gene means and fold differences: linear model on the log scale
• Gene variances: borrow information across genes by assuming exchangeable variances
• Mixture prior on fold difference parameters
• Point mass prior for ‘null hypothesis’
Mixture model features
4
• 1st level
yg1r | g, dg, g1 N(g – ½ dg , g12),
yg2r | g, dg, g2 N(g + ½ dg , g22),
• 2nd level
gs2 | as, bs
IG (as, bs)
dg ~ 0δ0 + 1G_ (1.5, 1) + 2G+ (1.5, 2)
• 3rd level
Gamma hyper prior for 1 , 2 , as, bs
Dirichlet distribution for (0, 1, 2)
Fully Bayesian mixture model for differential expression
Explicit modellingof the alternative
H0
5
• In full Bayesian framework, introduce latent allocation variable zg = 0,1 for gene g in null, alternative
• For each gene, calculate posterior probability of belonging to unmodified component: pg = Pr( zg = 0 | data )
• Classify using cut-off on pg (Bayes rule corresponds to 0.5)
• For any given pg , can estimate FDR, FNR.
Decision Rules
For gene-list S, est. (FDR | data) = Σg S pg / |S|
6
Simulation Study
Explore Explore performance of fully Bayesian mixture in
different situations:
• Non-standard distribution of DE genes
• Small number of DE genes
• Small number of replicate arrays
• Asymmetric distributions of over- and under-expressed genes
Simulated data, 50 simulated data sets for each of several different set-ups.
7
2500 genes, 8 replicates in each experimental condition
dg ~ 0δ0 + 1 ( Unif() + (1 - ) N() ) + 2 ( Unif() + (1 - ) N() )
gs ~ logNorm(-1.8, 0.5) ( logNorm based on data )
Simulation Study
8Gamma distributions superimposed
Non-standard distributions of DE genes
Av. est. π0 = 0.805 ± 0.010
Av. est. π0 = 0.797 ± 0.010
Av. est. π0 = 0.781 ± 0.010
= 0.3 = 0.5 = 0.8
π0 = 0.8
9
Small number of DE genes / Small number of replicate arrays
True π0 = 0.95
True π0 = 0.99
8 replicates
Av. FDR = 7.0 %Av. FNR = 2.0 %Av. est. π0 = 0.947 ± 0.007
3 replicates
Av. FDR = 17.9 %Av. FNR = 3.6 %Av. est. π0 = 0.956 ± 0.009
8 replicates
Av. FDR = 9.2 %Av. FNR = 0.6 %Av. est. π0 = 0.990 ± 0.003
3 replicates
Av. FDR = 17.6 %Av. FNR = 0.9 %Av. est. π0 = 0.995 ± 0.007
10
Asymmetric distributions of over/under-expressed genes
True π0 = 0.9True π1 = 0.09True π2 = 0.01
Av. est. π0 = 0.897 ± 0.007Av. est. π1 = 0.093 ± 0.003Av. est. π2 = 0.011 ± 0.006
dg ~ 0δ0 + 1 (0.6 Unif( 0.01 , 1.7 ) + 0.4 N(1.7 , 0.8) ) + 2 (0.6 Unif( -0.7 , -0.01 ) + 0.4 N( -0.7 , 0.8) )
11
1) FDR / FNR can be estimated well
Additional Checks
50 simulations of same set-up:Av. est. π0 = 0.999No genes are declared to be DE.
2) Model works when there are no DE genes
True FDREst. FDR
True FNREst. FNR
12
Comparison with conjugate mixture prior
Replacedg ~ 0δ0 + 1G_ (1.5, 1) + 2G+ (1.5, 2)
withdg ~ 0δ0 + 1 N(0, cg
2 )
NB: We estimate both c and 0 in fully Bayesian way.
True 0 Est. 0 with
Gamma prior
Est. 0 with
conjugate prior
0.8 0.781 ± 0.010 0.796 ± 0.010
0.95 0.947 ± 0.007 0.955 ± 0.006
0.99 0.990 ± 0.003 0.991 ± 0.003
1 0.999 ± 0.001 0.999 ± 0.001
13
Application to Mouse data
Mouse wildtype (WT) and knock-out (KO) data (Affymetrix)
~ 22700 genes, 8 replicates in each WT and KO
Gamma prior Est. π0 = 0.996 ± 0.001 Declares 59 genes DE
14
Summary
• Good performance of fully Bayesian mixture model– can estimate proportion of DE genes in variety of situations– accurate estimation of FDR / FNR
• Different mixture priors give similar classification
results
• Gives reasonable results for real data