lewin a 1 , richardson s 1 , marshall c 1 , glazier a 2 and aitman t 2 (2006),

30
Lewin A 1 , Richardson S 1 , Marshall C 1 , Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College Microarray Centre Bayesian Modelling of Differential Gene Expression

Upload: pelham

Post on 31-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Bayesian Modelling of Differential Gene Expression. Lewin A 1 , Richardson S 1 , Marshall C 1 , Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College Microarray Centre. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Lewin A1, Richardson S1, Marshall C1, Glazier A2 and Aitman T2 (2006),

Biometrics 62, 1-9.

1: Imperial College Dept. Epidemiology2: Imperial College Microarray Centre

Bayesian Modelling of Differential Gene Expression

Page 2: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Introduction to microarrays and differential expression

Bayesian hierarchical model for differential expression

Decision rules

Predictive model checks

Gene Ontology analysis for differentially expressed genes

Further work

Outline

Page 3: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

(1) Array contains thousands of spots

Millions of strands of DNA of known sequence fixed to each spot

(2) Sample (unknown sequences of cDNA) labelled with fluorescent dye

(3) Matching sequences of DNA and cDNA hybridize together

**

**

*

(4) Array washed only matching samples left (see which from fluorescent spots)

Pictures courtesy of Affymetrix

Microarrays measure gene expression (mRNA)

DNA TGCT

cDNA ACGA

Page 4: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Microarray Data

3 SHR compared with 3 transgenic rats (with Cd36)

3 wildtype (normal) mice compared with 3 mice with Cd36 knocked out

12000 genes on each array

Biological Question

Find genes which are expressed differently between animals with and without Cd36.

Microarray experiment to find genes associated with Cd36

Cd36: gene known to be important in insulin resistance Aitman et al 1999, Nature Genet 21:76-83

Page 5: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Introduction to microarrays and differential expression

Bayesian hierarchical model for differential expression

Decision rules

Predictive model checks

Gene Ontology analysis for differentially expressed genes

Further work

Outline

Page 6: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

1st level yg1r | g, δg, g1 N(g – ½ δg + r(g)1 , g1

2), yg2r | g, δg, g2 N(g + ½ δg + r(g)2 , g2

2),

Bayesian hierarchical model for differential expression

array effect or normalisation (function of g)

differential effect for gene g between 2 conditions

(fixed effect or mixture prior)

overall gene expression

(fixed effect)variance for each gene

ygsr is log gene expession

Page 7: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

2nd level gs

2 | μs, τs logNorm (μs, τs)

Hyper-parameters μs and τs can be influential, so these are estimated in the model.

3rd levelμs N( c, d) τs Gamma (e, f)

Prior for gene variances

Variances estimated using information from all measurements (~12000 x 3) rather than just 3

3 wildtype mice

Page 8: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Spline Curver(g)s = quadratic in g for ars(k-1) ≤ g ≤ ars(k)

with coeff (brsk(1), brsk

(2) ), k =1, … #breakpoints

Prior for array effects (Normalization)

Locations of break points not fixedMust do sensitivity checks on # break points

a1 a2 a3a0

Page 9: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

loessBayesian posterior mean

Array effect as function of gene effect

Page 10: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Inference on δ

(1)dg = E(δg | data) posterior mean

Like point estimate of log fold change.

Decision Rule: gene g is DE if |dg| > δcut

(2)pg = P( |δg| > δcut | data)

posterior probability (incorporates uncertainty)

Decision Rule: gene g is DE if pg > pcut

This allows biologist to specify what size of effect

is interesting (not just statistical significance)

Decision Rules for Inference: Fixed Effects Model

biologicalinterest

biologicalinterest

statisticalconfidence

Page 11: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Illustration of decision rule

pg = P( |δg| > log(2)

and g > 4 | data)

x pg > 0.8

Δ t-statistic > 2.78 (95% CI)

3 wildtype v. 3 knock-out mice

Page 12: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Introduction to microarrays and differential expression

Bayesian hierarchical model for differential expression

Decision rules

Predictive model checks

Gene Ontology analysis for differentially expressed genes

Further work

Outline

Page 13: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Key Points

Predict new data from the model (using the posterior distribution)

Get Bayesian p-value for each gene

Use all genes together (1000’s) to assess model fit (p-value distribution close to Uniform if model is good)

Predictive Model Checks

Page 14: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Mixed Predictive Checks

g

ybarg Sgpost.pred.

Sg

mixedpred.

Sg

σgpredσg

μ,τ

Mixed prediction is less conservative than posterior prediction

Page 15: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Bayesian predictive p-values

Page 16: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Introduction to microarrays and differential expression

Bayesian hierarchical model for differential expression

Decision rules

Predictive model checks

Gene Ontology analysis for differentially expressed genes

Further work

Outline

Page 17: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Picture from Gene Ontology website

Links connect more general to more specific terms

Directed Acyclic Graph

~16,000 terms

Gene Ontology: network of terms

Page 18: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Picture from Gene Ontology website

Each term may have 1000s of genes annotated (or none)

Gene may be annotated to several GO terms

Gene annotated to term A annotated to all ancestors of A

Annotations of genes to a node

Page 19: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

GO annotations of genes associated with the insulin-resistance gene Cd36

Compare GO annotations of genes most and least differentially expressed

Most differentially expressed ↔ pg > 0.5 (280 genes)

Least differentially expressed ↔ pg < 0.2 (11171 genes)

Page 20: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

GO annotations of genes associated with the insulin-resistance gene Cd36

Use Fisher’s test to compare GO annotations of genes most and least differentially expressed (one test for each GO term)

None significant with simple multiple testing adjustment, but there are many dependencies

Inflammatory response recently

found to be important in insulin resistance

Page 21: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Summary of work in Biometrics paper

Bayesian hierarchical model flexible, estimates variances robustly

Predictive model checks show exchangeable prior good for gene variances

Useful to find GO terms over-represented in the most differentially-expressed genes

Page 22: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Introduction to microarrays and differential expression

Bayesian hierarchical model for differential expression

Decision rules

Predictive model checks

Gene Ontology analysis for differentially expressed genes

Further work

Outline

Page 23: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

BGmix: mixture model for differential expression

Group genes into 3 classes: non-DE over-expressed under-expressed

Estimation and classification is simultaneous

Change the prior on the differential expression parameters δg

Page 24: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

BGmix: mixture model for differential expression

Choice of Null Distribution True log fold changes = 0

‘Nugget’ null: true log fold changes = small but not necessarily zero

Choice of DE genes distributions Gammas

Uniforms

Normal

Page 25: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Outputs Point estimates (and s.d.) of log fold changes (stabilised and

smoothed)

Posterior probability for gene to be in each group

Estimate of proportion of differentially expressed genes based on grouping (parameter of model)

BGmix: mixture model for differential expression

Page 26: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Obtaining gene lists Threshold on posterior probabilities

(Posterior probability of classification in the null < threshold → gene is DE)

Estimate of False Discovery Rate for any gene list (estimate = average of posterior probabilities)

Very simple estimate!

Choice of decision rule: Bayes Rule Fix False Discovery Rate More complex rules for mixture

of 3 components

BGmix: mixture model for differential expression

Page 27: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

g gpred

zg

ybarg Sg

mixedpred.

ybarg

mixedpred.

Sg

σgpredσg

μ,τη

w Model checks for

differential expression parameters δg

More complex for mixture model

Important point: we check each mixture component separately

Predictive Checks for Mixture Model

Page 28: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Bayesian p-values for Mixture Model

Simulated data from incorrect model

Simulated data from correct model

Page 29: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Acknowledgements

Co-authors

Sylvia Richardson, Clare Marshall (IC Epidemiology)

Tim Aitman, Anne-Marie Glazier (IC Microarray Centre)

Collaborators on BGX Grant

Anne-Mette Hein, Natalia Bochkina (IC Epidemiology)

Helen Causton (IC Microarray Centre)

Peter Green (Bristol)

BBSRC Exploiting Genomics Grant

Page 30: Lewin A 1 , Richardson S 1 , Marshall C 1 ,  Glazier A 2  and Aitman T 2  (2006),

Papers and Software

Software:

Winbugs code for model in Biometrics paper

BGmix (R package) includes mixture model

Papers:

BGmix paper, submitted

Paper on predictive checks for mixure prior, in preparation

http://www.bgx.org.uk/