tom kepler santa fe institute normalization and analysis of dna microarray data by self-consistency...

Post on 17-Jan-2016

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Tom KeplerSanta Fe Institute

 Normalization and Analysis

of DNA Microarray Data

by Self-Consistency

and Local Regression

kepler@santafe.edu

Rat mesothelioma cellscontrol

Rat mesothelioma cellstreated with KBrO2

NormalizationMethod to be improved:

1. Assume that some genes will not change under the treatment under investigation.

2. Identify these core genes in advance of the experiment.

3. Normalize all genes against these genes assuming they do not change

 

NormalizationNew Method:

1. Assume that some genes will not change under the treatment under investigation.

2. Choose these core genes arbitrarily.3. Normalize (provisionally) all genes

against these genes assuming they do not change.

4. Determine which genes do not change under this normalization.

5. Make this set the new core. If this core differs from the previous core, go to 3. Else, done.

 

I c mRNA [ ]

I = spot intensity[mRNA] = concentration of specific mRNAc = normalization constant

Error Model

I c mRNA [ ]

I = spot intensity[mRNA] = concentration of specific mRNAc = normalization constant = lognormal multiplicative error

Error Model

I c mRNAijk ij ik ijk [ ]

I = spot intensity[mRNA] = concentration of specific mRNAc = normalization constant = lognormal multiplicative error

index 1, i: treatment groupindex 2, j: replicate within treatmentindex 3, k: spot (gene)

Error Model

Y = log spot intensity = mean log concentration of specific mRNA = treatment effect (conc. specific mRNA) = normalization constant = normal additive error

index 1, i: treatment groupindex 2, j: replicate within treatmentindex 3, k: spot (gene)

Identifiability constraints:

Model:

x Y Y

a Y

d Y Y Y Y

k k

ij i ij

ik i i k k i

Estimate by ordinary least squares:

Identifiability constraints:

Model:

But note: cannot identify between and

Self-consistency:

The weight wk() is small if the kth gene is judged to be changed; close to one if it is judged to be unchanged.

Procedure is iterative.

-2 0 2 4 6

log intensity, array 1

-2

0

2

4

6

log

inte

nsi

ty,

arr

ay

2

-2 0 2 4 6

log intensity, array 1

-2

0

2

4

6

log

inte

nsi

ty,

arr

ay

2

Failure of Model

Generalized Model

The normalization ij(k) and the heteroscedasticity

function ij(k) are slowly varying functions

of the intensity, .

Estimate by Local Regression

data

Local Regression

Predict value at x=50: weight, linear regression

Predict whole function similarly

Compare to known true function

Simulation-based Validation1. Reproduce observed bias.

Simulation-based Validation2. Reproduce observed heteroscedasticity.

Test based on z statistic:

21

12

11nn

s

ddz

k

kkk

Choice of significance level:expected number of false positives:

E(false positives) = N

But minimum detectable difference increases as gets smaller

E(fp) min diff min ratio

0.05 250 0.916 2.50.01 50 1.09 30.001 5 1.29 3.60.0001 0.5 1.61 5

Validation of method against simulated data3. Hypothesis testing: Simulated from stated model

Pro

port

ion

chan

ged

spot

s

“-fo

ld c

hang

e”

bias

“rate false pos.” = mean observed / expected

Simulated data: mis-specified model — multiplicative + additive noise

Validation of method against simulated data4. Hypothesis testing: Simulated from “wrong” model: additive + multiplicative noise.

Pro

port

ion

chan

ged

spot

s

“-fo

ld c

hang

e”

bias

Acknowledgments

Lynn CrosbyNorth Carolina State University

Kevin MorganStrategic Toxicological Sciences

GlaxoWellcome

Santa Fe Institute

www.santafe.edu 

postdoctoral fellowships available(apply before the end of the year)

kepler@santafe.edu

top related