tom kepler santa fe institute normalization and analysis of dna microarray data by self-consistency...

Tom KeplerSanta Fe Institute

Normalization and Analysis

of DNA Microarray Data

by Self-Consistency

and Local Regression

kepler@santafe.edu

Rat mesothelioma cellscontrol

Rat mesothelioma cellstreated with KBrO2

NormalizationMethod to be improved:

1. Assume that some genes will not change under the treatment under investigation.

2. Identify these core genes in advance of the experiment.

3. Normalize all genes against these genes assuming they do not change

NormalizationNew Method:

1. Assume that some genes will not change under the treatment under investigation.

2. Choose these core genes arbitrarily.3. Normalize (provisionally) all genes

against these genes assuming they do not change.

4. Determine which genes do not change under this normalization.

5. Make this set the new core. If this core differs from the previous core, go to 3. Else, done.

I c mRNA [ ]

I = spot intensity[mRNA] = concentration of specific mRNAc = normalization constant

Error Model

I c mRNA [ ]

I = spot intensity[mRNA] = concentration of specific mRNAc = normalization constant = lognormal multiplicative error

Error Model

I c mRNAijk ij ik ijk [ ]

I = spot intensity[mRNA] = concentration of specific mRNAc = normalization constant = lognormal multiplicative error

index 1, i: treatment groupindex 2, j: replicate within treatmentindex 3, k: spot (gene)

Error Model

Y = log spot intensity = mean log concentration of specific mRNA = treatment effect (conc. specific mRNA) = normalization constant = normal additive error

index 1, i: treatment groupindex 2, j: replicate within treatmentindex 3, k: spot (gene)

Identifiability constraints:

Model:

d Y Y Y Y

ij i ij

ik i i k k i

Estimate by ordinary least squares:

Identifiability constraints:

Model:

But note: cannot identify between and

Self-consistency:

The weight wk() is small if the kth gene is judged to be changed; close to one if it is judged to be unchanged.

Procedure is iterative.

-2 0 2 4 6

log intensity, array 1

-2 0 2 4 6

log intensity, array 1

Failure of Model

Generalized Model

The normalization ij(k) and the heteroscedasticity

function ij(k) are slowly varying functions

of the intensity, .

Estimate by Local Regression

Local Regression

Predict value at x=50: weight, linear regression

Predict whole function similarly

Compare to known true function

Simulation-based Validation1. Reproduce observed bias.

Simulation-based Validation2. Reproduce observed heteroscedasticity.

Test based on z statistic:

Choice of significance level:expected number of false positives:

E(false positives) = N

But minimum detectable difference increases as gets smaller

E(fp) min diff min ratio

0.05 250 0.916 2.50.01 50 1.09 30.001 5 1.29 3.60.0001 0.5 1.61 5

Validation of method against simulated data3. Hypothesis testing: Simulated from stated model

“-fo

“rate false pos.” = mean observed / expected

Simulated data: mis-specified model — multiplicative + additive noise

Validation of method against simulated data4. Hypothesis testing: Simulated from “wrong” model: additive + multiplicative noise.

“-fo

Acknowledgments

Lynn CrosbyNorth Carolina State University

Kevin MorganStrategic Toxicological Sciences

GlaxoWellcome

Santa Fe Institute

www.santafe.edu

postdoctoral fellowships available(apply before the end of the year)

kepler@santafe.edu

tom kepler santa fe institute normalization and analysis of dna microarray data by self-consistency...

core genes

spot intensitymrna

treatment groupindex

log spot intensity

treatment effect

previous core

new core

spot geneerror modely

Documents

microarray cgh

biovlab-microarray: microarray data analysis in...

factors contributing to variability in dna microarray...

microarray statistics

para acceder a las cuentas de correo @santafe.edu

microarray background

johannes-kepler- sternwarte...

clustering algorithms for microarray data...

bioinformatica microarray

kepler-62: a five-planet system with planets of 1.4 and 1...

microarray isac

dna arrays - cbs · print microarray hybridize to...

microarray basics, and planning a microarray experiment

why microarray?

dna microarray

dna microarray and array data analysis - computer...

the kepler space university and kepler space ...

protoarray human protein microarray v5.0 kinase substrate...

normalization of dna microarray data · normalization of...

microarray - introduction