alex lewin centre for biostatistics imperial college, london

14
1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture models for classifying differentially expressed genes

Upload: calais

Post on 14-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Mixture models for classifying differentially expressed genes. Alex Lewin Centre for Biostatistics Imperial College, London. Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant. Modelling differential expression. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Alex Lewin Centre for Biostatistics Imperial College, London

1

Alex LewinCentre for Biostatistics

Imperial College, London

Joint work with Natalia Bochkina, Sylvia Richardson

BBSRC Exploiting Genomics grant

Mixture models for classifying differentially expressed genes

Page 2: Alex Lewin Centre for Biostatistics Imperial College, London

2

Modelling differential expression

• Many different methods/models for differential expression– t-test – t-test with stabilised variances (EB)– Bayesian hierarchical models– mixture models

• Choice whether to model alternative hypothesis or not

• Our model: – Model the alternative hypothesis – Fully Bayesian

Page 3: Alex Lewin Centre for Biostatistics Imperial College, London

3

• Gene means and fold differences: linear model on the log scale

• Gene variances: borrow information across genes by assuming exchangeable variances

• Mixture prior on fold difference parameters

• Point mass prior for ‘null hypothesis’

Mixture model features

Page 4: Alex Lewin Centre for Biostatistics Imperial College, London

4

• 1st level

yg1r | g, dg, g1 N(g – ½ dg , g12),

yg2r | g, dg, g2 N(g + ½ dg , g22),

• 2nd level

gs2 | as, bs

IG (as, bs)

dg ~ 0δ0 + 1G_ (1.5, 1) + 2G+ (1.5, 2)

• 3rd level

Gamma hyper prior for 1 , 2 , as, bs

Dirichlet distribution for (0, 1, 2)

Fully Bayesian mixture model for differential expression

Explicit modellingof the alternative

H0

Page 5: Alex Lewin Centre for Biostatistics Imperial College, London

5

• In full Bayesian framework, introduce latent allocation variable zg = 0,1 for gene g in null, alternative

• For each gene, calculate posterior probability of belonging to unmodified component: pg = Pr( zg = 0 | data )

• Classify using cut-off on pg (Bayes rule corresponds to 0.5)

• For any given pg , can estimate FDR, FNR.

Decision Rules

For gene-list S, est. (FDR | data) = Σg S pg / |S|

Page 6: Alex Lewin Centre for Biostatistics Imperial College, London

6

Simulation Study

Explore Explore performance of fully Bayesian mixture in

different situations:

• Non-standard distribution of DE genes

• Small number of DE genes

• Small number of replicate arrays

• Asymmetric distributions of over- and under-expressed genes

Simulated data, 50 simulated data sets for each of several different set-ups.

Page 7: Alex Lewin Centre for Biostatistics Imperial College, London

7

2500 genes, 8 replicates in each experimental condition

dg ~ 0δ0 + 1 ( Unif() + (1 - ) N() ) + 2 ( Unif() + (1 - ) N() )

gs ~ logNorm(-1.8, 0.5) ( logNorm based on data )

Simulation Study

Page 8: Alex Lewin Centre for Biostatistics Imperial College, London

8Gamma distributions superimposed

Non-standard distributions of DE genes

Av. est. π0 = 0.805 ± 0.010

Av. est. π0 = 0.797 ± 0.010

Av. est. π0 = 0.781 ± 0.010

= 0.3 = 0.5 = 0.8

π0 = 0.8

Page 9: Alex Lewin Centre for Biostatistics Imperial College, London

9

Small number of DE genes / Small number of replicate arrays

True π0 = 0.95

True π0 = 0.99

8 replicates

Av. FDR = 7.0 %Av. FNR = 2.0 %Av. est. π0 = 0.947 ± 0.007

3 replicates

Av. FDR = 17.9 %Av. FNR = 3.6 %Av. est. π0 = 0.956 ± 0.009

8 replicates

Av. FDR = 9.2 %Av. FNR = 0.6 %Av. est. π0 = 0.990 ± 0.003

3 replicates

Av. FDR = 17.6 %Av. FNR = 0.9 %Av. est. π0 = 0.995 ± 0.007

Page 10: Alex Lewin Centre for Biostatistics Imperial College, London

10

Asymmetric distributions of over/under-expressed genes

True π0 = 0.9True π1 = 0.09True π2 = 0.01

Av. est. π0 = 0.897 ± 0.007Av. est. π1 = 0.093 ± 0.003Av. est. π2 = 0.011 ± 0.006

dg ~ 0δ0 + 1 (0.6 Unif( 0.01 , 1.7 ) + 0.4 N(1.7 , 0.8) ) + 2 (0.6 Unif( -0.7 , -0.01 ) + 0.4 N( -0.7 , 0.8) )

Page 11: Alex Lewin Centre for Biostatistics Imperial College, London

11

1) FDR / FNR can be estimated well

Additional Checks

50 simulations of same set-up:Av. est. π0 = 0.999No genes are declared to be DE.

2) Model works when there are no DE genes

True FDREst. FDR

True FNREst. FNR

Page 12: Alex Lewin Centre for Biostatistics Imperial College, London

12

Comparison with conjugate mixture prior

Replacedg ~ 0δ0 + 1G_ (1.5, 1) + 2G+ (1.5, 2)

withdg ~ 0δ0 + 1 N(0, cg

2 )

NB: We estimate both c and 0 in fully Bayesian way.

True 0 Est. 0 with

Gamma prior

Est. 0 with

conjugate prior

0.8 0.781 ± 0.010 0.796 ± 0.010

0.95 0.947 ± 0.007 0.955 ± 0.006

0.99 0.990 ± 0.003 0.991 ± 0.003

1 0.999 ± 0.001 0.999 ± 0.001

Page 13: Alex Lewin Centre for Biostatistics Imperial College, London

13

Application to Mouse data

Mouse wildtype (WT) and knock-out (KO) data (Affymetrix)

~ 22700 genes, 8 replicates in each WT and KO

Gamma prior Est. π0 = 0.996 ± 0.001 Declares 59 genes DE

Page 14: Alex Lewin Centre for Biostatistics Imperial College, London

14

Summary

• Good performance of fully Bayesian mixture model– can estimate proportion of DE genes in variety of situations– accurate estimation of FDR / FNR

• Different mixture priors give similar classification

results

• Gives reasonable results for real data