a routine approach to quality control

37
D A TA G EN E A Routine Approach to Quality Control Peter Haberl 19. 11. 2001 D A TA G EN E

Upload: jerry-ellison

Post on 31-Dec-2015

20 views

Category:

Documents


0 download

DESCRIPTION

A Routine Approach to Quality Control. Peter Haberl 19. 11. 2001. Content. The GDE Controller. Workflow Gradients Distortions Local defects Condensing. Playing with negative AvgDiff values. GDE Controller. Workflow. .CEL. DB. Upload server. GD Expressionist™ Analyst. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Routine Approach to Quality Control

DATA

GENE

A Routine Approach

to Quality Control

Peter Haberl 19. 11. 2001

DATA

GENE

Page 2: A Routine Approach to Quality Control

DATA

GENE

Content

The GDE Controller

Playing with negative AvgDiff values

- Workflow- Gradients- Distortions- Local defects- Condensing

Page 3: A Routine Approach to Quality Control

DATA

GENE

GDE Controller

... is part of the GD ExpressionistTM system

feature data(.CEL files)

GD CoBi™Database

Upload server

.ABS

.REL

.CEL

DB

GD Expressionist™Controller

GD Expressionist™Analyst

Workflow

Page 4: A Routine Approach to Quality Control

DATA

GENE

... extends the conventional data flow

Quality Control

.DAT

Affymetrix GeneData

- Intensity values- Flagging of outliers

Condensation

.ABS

.CDF

Outliersok

Condensation

.CEL

.INS

.CDF

.CHP

.CEL

.CEL

DB

GD Expressionist™Analyst

GDE Controller Workflow

Page 5: A Routine Approach to Quality Control

DATA

GENE

GDE Controller Workflow

login

options andthresholds

available chip layouts(.CDF files)

available experiments(.CEL files)

Page 6: A Routine Approach to Quality Control

DATA

GENE

... detection

The Controller is about ...

... correction

... condensing

of location dependent systematic effects (gradients)of intensity dependent systematic effects (distortions)of local defects

of global gradientsof global distortions

constructing expression values using different algorithms

GDE Controller Workflow

Page 7: A Routine Approach to Quality Control

DATA

GENE

Gradients:

incomplete washing?

thermal effects?

... ?

GDE Controller Gradients

Page 8: A Routine Approach to Quality Control

DATA

GENE

Idea:*)

(single chip version)

*) developed in discussions with H. Seidel (Schering, Berlin)

...

divide the chip into 4 x 4 sectors (as for the background determination)

look at the feature distribution in each sector, in particular at the mode (maximum position) and the width

ln (

cou

nts

)

ln ( intensity )

GDE Controller Gradients

Page 9: A Routine Approach to Quality Control

DATA

GENE

In an iterative process, transform the intensities I(x,y) I’(x,y) = a(x,y) I(x,y) + b(x,y) such that the sector histograms become aligned.

scale factor a(x,y)in first step:

offset b(x,y)in first step:

all sector histograms after first step:

all sector histograms after third step:

GDE Controller Gradients

Page 10: A Routine Approach to Quality Control

DATA

GENE

It was later decided to perform only a multiplicative correction, I(x,y) I’(x,y) = a(x,y) I(x,y) for two reasons:- practical application showed that the scale factor is the dominant effect;- the observable AvgDiff is insensitive to the offset b(x,y) .

A basic assumption of the ‘single-chip’ version is that the distribution of bright and dark features is random. If this assumption is violated (e.g. for the yeast chip), the ‘single-chip’ version encounters problems.

The ‘multi-chip’ version compares the sector histograms not among themselves, but to the sector histograms of a ‘reference chip’. (This is of course only possible if enough ‘similar’ chips are available.)

GDE Controller Gradients

Page 11: A Routine Approach to Quality Control

DATA

GENE

Result of Gradient Correction:

‘heat map’ of the scale factor a(x,y)

original corrected

GDE Controller Gradients

Page 12: A Routine Approach to Quality Control

DATA

GENE

Further example of Gradient Correction:

‘heat map’ of the scale factor

original

corrected

GDE Controller Gradients

Page 13: A Routine Approach to Quality Control

DATA

GENE

Distortions:

A log-log plot of coding (i.e. PMand MM) features can show a nonlinear relationship when compared to the features of a ‘reference chip’.

One of the reasons can be that chips from different chip lots are combined to a series.

(Again, the reference chip can only be constructed if enough ‘similar’ chips are available.)

GDE Controller Distortions

Page 14: A Routine Approach to Quality Control

DATA

GENE

Idea:

divide the reference signal region into stripes containing the same number of points (red lines)

in each stripe, determine the median of experiment signals (or – equivalently – the point of maximum density)

force this median line to be the diagonal of the new point cloud; this determines the (intensity dependent) transformation

reference

exp

erim

en

t

GDE Controller Distortions

Page 15: A Routine Approach to Quality Control

DATA

GENE

Result of Distortion Correction:

impossibleto correct

GDE Controller Distortions

Page 16: A Routine Approach to Quality Control

DATA

GENE

Reference chip:

serves as a ‘virtual standard’ for a given experiment set

Both gradient and distortion detection/correction require the concept of a

the experiment set should be homogeneous:- chips from the same production lot- probes from the same tissue- a small number of differentially expressed genes- doesn’t change the characteristic pattern

the reference chip is computed featurewise (as mean or median)

the chips have to be made comparable, for instance with a global logarithmic-mean normalization

normalized set

reference chip

GDE Controller Reference Chip

Page 17: A Routine Approach to Quality Control

DATA

GENE

Local defects:

There are local defects which are already visible in a global chip view:

Aim: Can we reliably detect smaller local defects, if possible automatically?

view ofoutlierlocations:

GDE Controller Local Defects

Page 18: A Routine Approach to Quality Control

DATA

GENE

Idea:

construct a ‘ratio chip’ by dividing each feature by its counterpart on the reference chip

for visualisation purposes, show in- green features which are brighter- red features which are darker- black features that don’t change

local defects should show up asspeckles of homogeneous color,with diameters of at least several features

0

1

0 1 2

y00 y01 y02

y10 y11 y12

reference

0

1

0 1 2

x00 x01 x02

x10 x11 x12

experiment

x00/y000

1

0 1 2

x01/y01 x02/y02

x10/y10 x11/y11 x12/y12

ratio chip

GDE Controller Local Defects

Page 19: A Routine Approach to Quality Control

DATA

GENEdifferential regulation

actual defects

GDE Controller Local Defects

Page 20: A Routine Approach to Quality Control

DATA

GENE

This method can identify defects which would be hard to find ...

GDE Controller Local Defects

Page 21: A Routine Approach to Quality Control

DATA

GENE

... or invisible, even in a zoomed view:

GDE Controller Local Defects

Page 22: A Routine Approach to Quality Control

DATA

GENE

For old (row-wise spotted) chips,there is the danger that differen-tially expressed genes are detected as chip artefacts

Application of pattern search algorithms can solve this problem

differential regulation

GDE Controller Local Defects

Page 23: A Routine Approach to Quality Control

DATA

GENE

Further exampleof a local defect:

GDE Controller Local Defects

Page 24: A Routine Approach to Quality Control

DATA

GENE

Defects can have a certainspatial extension:

GDE Controller Local Defects

Page 25: A Routine Approach to Quality Control

DATA

GENE

GDE Controller Local Defects

Most frequent structures:

Page 26: A Routine Approach to Quality Control

DATA

GENE

GDE Controller Local Defects

... and others:

Page 27: A Routine Approach to Quality Control

DATA

GENE

An interactive chip viewer allows to

- view identified mask areas- zoom and find out which genes - are affected by masking - manually edit the masked areas

GDE Controller Local Defects

Page 28: A Routine Approach to Quality Control

DATA

GENE

GDE Controller Workflow

reporting

export to database, into analysissoftware or as .CEL files

choose between differentcondensing algorithms:MAS4, MAS5, GeneData( = trimmed mean of log(PM) )

Page 29: A Routine Approach to Quality Control

DATA

GENE

replicates: large differential expression:

log-log plot:

correlation of large values is visible

only positive values can be displayed

Playing with negative AvgDiff values

Page 30: A Routine Approach to Quality Control

DATA

GENE

replicates: large differential expression:

linear-linear plot:

Playing with negative AvgDiff values

negative values can be displayed

poor resolution for small values

large values appear scattered

Page 31: A Routine Approach to Quality Control

DATA

GENE

replicates:

‘cube-root’ plot:

damping at large values

‘zero density regions’ (artefact)

display of positive and negative values

y = AvgDiff 3

Playing with negative AvgDiff values

Page 32: A Routine Approach to Quality Control

DATA

GENE

‘lin-log’ transformation:

damping of high values

interpolates smoothly between linear (for small values) and logarithmic (for large values) behaviour

y = sign(x)*ln( 1 + |x| )

sign(x)*ln( 1 + |x| ) =

Playing with negative AvgDiff values

y = x

y = ln(x)

x + o(x3) , x <x2

2-+

± ln( |x| ) + + o( ) ,1 x

1 x2

< 1

x >> 1

=

Page 33: A Routine Approach to Quality Control

DATA

GENE

replicates: large differential expression:

‘lin-log’ plot:

Playing with negative AvgDiff values

A good choice is x = AvgDiff / Target , i.e. the target intensity sets the scale

Lines of constant factors are shown in blue (2), red (5) and green (10)

Page 34: A Routine Approach to Quality Control

DATA

GENE

Consider the following ‘experiment’:Construct faked .CEL files, where all PM-MM-pairs are interchanged,and condense them with the old Affymetrix algorithm (ignoring AbsCall).

Amusing observation:If one ignores that the scale factor gets negative,

(MAS doesn’t: “Failed to analyze due to invalid Scale Factor”)the old (MAS4) algorithm would be invariant under PM MM !

Target

TrimmedMean(AvgDiff)SF =

Playing with negative AvgDiff values

The ‘lin-log’ plots allow to look at positive and negative AvgDiff valuessimultaneously. But why would we want to look at the negatives at all?

Page 35: A Routine Approach to Quality Control

DATA

GENE

perfect group separation:

within replicate groups

across replicate groups

Playing with negative AvgDiff values

Original data: the ‘three-tissue-dataset’: 3 groups with 6 replicates each

Page 36: A Routine Approach to Quality Control

DATA

GENE

PM MM data:

These are log-log-plotsof negative AvgDiffs.

The good correlation athigh values indicatesthat these numbers arereproducible.

The difference betweenreplica groups is not so obvious, but ...

Playing with negative AvgDiff values

Page 37: A Routine Approach to Quality Control

DATA

GENE

... clustering again results in a complete group separation:

Take-home message:The mismatches carryinformation which can bemeasured reproducibly andcan be used (at least) forpattern comparisons.

Playing with negative AvgDiff values