alex lewin centre for biostatistics imperial college, london

1

Alex LewinCentre for Biostatistics

Imperial College, London

Joint work with Natalia Bochkina, Sylvia Richardson

BBSRC Exploiting Genomics grant

Mixture models for classifying differentially expressed genes

2

Modelling differential expression

• Many different methods/models for differential expression– t-test – t-test with stabilised variances (EB)– Bayesian hierarchical models– mixture models

• Choice whether to model alternative hypothesis or not

• Our model: – Model the alternative hypothesis – Fully Bayesian

3

• Gene means and fold differences: linear model on the log scale

• Gene variances: borrow information across genes by assuming exchangeable variances

• Mixture prior on fold difference parameters

• Point mass prior for ‘null hypothesis’

Mixture model features

4

• 1st level

yg1r | g, dg, g1 N(g – ½ dg , g12),

yg2r | g, dg, g2 N(g + ½ dg , g22),

• 2nd level

gs2 | as, bs

IG (as, bs)

dg ~ 0δ0 + 1G_ (1.5, 1) + 2G+ (1.5, 2)

• 3rd level

Gamma hyper prior for 1 , 2 , as, bs

Dirichlet distribution for (0, 1, 2)

Fully Bayesian mixture model for differential expression

Explicit modellingof the alternative

H0

5

• In full Bayesian framework, introduce latent allocation variable zg = 0,1 for gene g in null, alternative

• For each gene, calculate posterior probability of belonging to unmodified component: pg = Pr( zg = 0 | data )

• Classify using cut-off on pg (Bayes rule corresponds to 0.5)

• For any given pg , can estimate FDR, FNR.

Decision Rules

For gene-list S, est. (FDR | data) = Σg S pg / |S|

6

Simulation Study

Explore Explore performance of fully Bayesian mixture in

different situations:

• Non-standard distribution of DE genes

• Small number of DE genes

• Small number of replicate arrays

• Asymmetric distributions of over- and under-expressed genes

Simulated data, 50 simulated data sets for each of several different set-ups.

7

2500 genes, 8 replicates in each experimental condition

dg ~ 0δ0 + 1 ( Unif() + (1 - ) N() ) + 2 ( Unif() + (1 - ) N() )

gs ~ logNorm(-1.8, 0.5) ( logNorm based on data )

Simulation Study

8Gamma distributions superimposed

Non-standard distributions of DE genes

Av. est. π0 = 0.805 ± 0.010

Av. est. π0 = 0.797 ± 0.010

Av. est. π0 = 0.781 ± 0.010

= 0.3 = 0.5 = 0.8

π0 = 0.8

9

Small number of DE genes / Small number of replicate arrays

True π0 = 0.95

True π0 = 0.99

8 replicates

Av. FDR = 7.0 %Av. FNR = 2.0 %Av. est. π0 = 0.947 ± 0.007

3 replicates


8 replicates


3 replicates


10

Asymmetric distributions of over/under-expressed genes

True π0 = 0.9True π1 = 0.09True π2 = 0.01

Av. est. π0 = 0.897 ± 0.007Av. est. π1 = 0.093 ± 0.003Av. est. π2 = 0.011 ± 0.006

dg ~ 0δ0 + 1 (0.6 Unif( 0.01 , 1.7 ) + 0.4 N(1.7 , 0.8) ) + 2 (0.6 Unif( -0.7 , -0.01 ) + 0.4 N( -0.7 , 0.8) )

11

1) FDR / FNR can be estimated well

Additional Checks

50 simulations of same set-up:Av. est. π0 = 0.999No genes are declared to be DE.

2) Model works when there are no DE genes

True FDREst. FDR

True FNREst. FNR

12

Comparison with conjugate mixture prior

Replacedg ~ 0δ0 + 1G_ (1.5, 1) + 2G+ (1.5, 2)

withdg ~ 0δ0 + 1 N(0, cg

2 )

NB: We estimate both c and 0 in fully Bayesian way.

True 0 Est. 0 with

Gamma prior

Est. 0 with

conjugate prior

0.8 0.781 ± 0.010 0.796 ± 0.010

0.95 0.947 ± 0.007 0.955 ± 0.006

0.99 0.990 ± 0.003 0.991 ± 0.003

1 0.999 ± 0.001 0.999 ± 0.001

13

Application to Mouse data

Mouse wildtype (WT) and knock-out (KO) data (Affymetrix)

~ 22700 genes, 8 replicates in each WT and KO

Gamma prior Est. π0 = 0.996 ± 0.001 Declares 59 genes DE

14

Summary

• Good performance of fully Bayesian mixture model– can estimate proportion of DE genes in variety of situations– accurate estimation of FDR / FNR

• Different mixture priors give similar classification

results

• Gives reasonable results for real data

alex lewin centre for biostatistics imperial college, london

Documents

genes true

fdr data

real data

p2fully bayesian mixture

p00 p1 f unif

linear model

notour model

bayesian mixture modelcan