tom kepler santa fe institute normalization and analysis of dna microarray data by self-consistency...

36
Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression [email protected]

Upload: winifred-cunningham

Post on 17-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Tom KeplerSanta Fe Institute

 Normalization and Analysis

of DNA Microarray Data

by Self-Consistency

and Local Regression

[email protected]

Page 2: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu
Page 3: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Rat mesothelioma cellscontrol

Rat mesothelioma cellstreated with KBrO2

Page 4: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

NormalizationMethod to be improved:

1. Assume that some genes will not change under the treatment under investigation.

2. Identify these core genes in advance of the experiment.

3. Normalize all genes against these genes assuming they do not change

 

Page 5: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

NormalizationNew Method:

1. Assume that some genes will not change under the treatment under investigation.

2. Choose these core genes arbitrarily.3. Normalize (provisionally) all genes

against these genes assuming they do not change.

4. Determine which genes do not change under this normalization.

5. Make this set the new core. If this core differs from the previous core, go to 3. Else, done.

 

Page 6: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

I c mRNA [ ]

I = spot intensity[mRNA] = concentration of specific mRNAc = normalization constant

Error Model

Page 7: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

I c mRNA [ ]

I = spot intensity[mRNA] = concentration of specific mRNAc = normalization constant = lognormal multiplicative error

Error Model

Page 8: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

I c mRNAijk ij ik ijk [ ]

I = spot intensity[mRNA] = concentration of specific mRNAc = normalization constant = lognormal multiplicative error

index 1, i: treatment groupindex 2, j: replicate within treatmentindex 3, k: spot (gene)

Error Model

Page 9: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Y = log spot intensity = mean log concentration of specific mRNA = treatment effect (conc. specific mRNA) = normalization constant = normal additive error

index 1, i: treatment groupindex 2, j: replicate within treatmentindex 3, k: spot (gene)

Page 10: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Identifiability constraints:

Model:

x Y Y

a Y

d Y Y Y Y

k k

ij i ij

ik i i k k i

Estimate by ordinary least squares:

Page 11: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Identifiability constraints:

Model:

But note: cannot identify between and

Page 12: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Self-consistency:

The weight wk() is small if the kth gene is judged to be changed; close to one if it is judged to be unchanged.

Procedure is iterative.

Page 13: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu
Page 14: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

-2 0 2 4 6

log intensity, array 1

-2

0

2

4

6

log

inte

nsi

ty,

arr

ay

2

Page 15: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

-2 0 2 4 6

log intensity, array 1

-2

0

2

4

6

log

inte

nsi

ty,

arr

ay

2

Page 16: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Failure of Model

Page 17: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu
Page 18: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Generalized Model

The normalization ij(k) and the heteroscedasticity

function ij(k) are slowly varying functions

of the intensity, .

Estimate by Local Regression

Page 19: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

data

Local Regression

Page 20: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Predict value at x=50: weight, linear regression

Page 21: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Predict whole function similarly

Page 22: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu
Page 23: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Compare to known true function

Page 24: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu
Page 25: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu
Page 26: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu
Page 27: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Simulation-based Validation1. Reproduce observed bias.

Page 28: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Simulation-based Validation2. Reproduce observed heteroscedasticity.

Page 29: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Test based on z statistic:

21

12

11nn

s

ddz

k

kkk

Page 30: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Choice of significance level:expected number of false positives:

E(false positives) = N

But minimum detectable difference increases as gets smaller

Page 31: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

E(fp) min diff min ratio

0.05 250 0.916 2.50.01 50 1.09 30.001 5 1.29 3.60.0001 0.5 1.61 5

Page 32: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Validation of method against simulated data3. Hypothesis testing: Simulated from stated model

Pro

port

ion

chan

ged

spot

s

“-fo

ld c

hang

e”

bias

“rate false pos.” = mean observed / expected

Page 33: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Simulated data: mis-specified model — multiplicative + additive noise

Page 34: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Validation of method against simulated data4. Hypothesis testing: Simulated from “wrong” model: additive + multiplicative noise.

Pro

port

ion

chan

ged

spot

s

“-fo

ld c

hang

e”

bias

Page 35: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Acknowledgments

Lynn CrosbyNorth Carolina State University

Kevin MorganStrategic Toxicological Sciences

GlaxoWellcome

Page 36: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Santa Fe Institute

www.santafe.edu 

postdoctoral fellowships available(apply before the end of the year)

[email protected]