introduction to the design of cdna microarray experiments
DESCRIPTION
Introduction to the design of cDNA microarray experiments. Statistics 246, Spring 2002 Week 9, Lecture 1 Yee Hwa Yang. Some aspects of design. Layout of the array Which cDNA sequence to print? Library Controls Spatial position Allocation of samples to the slides - PowerPoint PPT PresentationTRANSCRIPT
Introduction to the design of cDNA Introduction to the design of cDNA microarray experimentsmicroarray experiments
Statistics 246, Spring 2002
Week 9, Lecture 1
Yee Hwa Yang
Some aspects of designSome aspects of design
Layout of the array– Which cDNA sequence to print?
• Library • Controls
– Spatial position
Allocation of samples to the slides – Different design layout
• A vs B : Treatment vs control• Multiple treatments• Time series• Factorial
– Replication• number of hybridizations• use of dye swap in replication• Different types replicates (e.g pooled vs unpooled material (samples))
– Other considerations• Physical limitations: the number of slides and the amount of material• Extensibility - linking
Issues that affect design of array Issues that affect design of array experimentsexperiments
Scientific• Aim of the experiment
Specific questions and priorities between them.How will the experiments answer the questions posed?
Practical (Logistic)• Types of mRNA samples:
reference, control, treatment 1, etc.• Amount of material.
Count the amount of mRNA involved in one channel of a hybridization as one unit.
• Number of slides available for experiment.Other Information• The experimental process prior to hybridization:
sample isolation, mRNA extraction, amplification , labelling.• Controls planned:
positive, negative, ratio, etc.• Verification method:
Northern, RT-PCR, in situ hybridization, etc.
Natural design choiceNatural design choice
Case 1: Meaningful biological control (C)
Samples: Liver tissue from four mice treated by cholesterol modifying drugs.
Question 1: Genes that respond differently between the T and the C.
Question 2: Genes that responded similarly across two or more treatments relative to control.
Case 2: Use of universal reference.
Samples: Different tumor samples.
Question: To discover tumor subtypes.
C
T1 T2 T3 T4 T1
Ref
T2 Tn-1 Tn
Direct vs IndirectDirect vs Indirect
Two samples
e.g. KO vs. WT or mutant vs. WT
T CT
CRef
Direct Indirect
2 /2 22
average (log (T/C)) log (T / Ref) – log (C / Ref )
I) Common Reference
II) Common reference
III ) Direct comparison
Number of Slides N = 3 N=6 N=3
Ave. variance 2 0.67
Units of material A = B = C = 1 A = B = C = 2 A = B = C = 2
Ave. variance 1 0.67
One-way layout: one factor, k levelsOne-way layout: one factor, k levels
C B
A
O
CBA
O
CBA
All pair-wise comparisons are of equal importance
Dye-swapDye-swap
C B
A
Design B1
C B
A
Design B2
- Design B1 and B2 have the same average variance- The direction of arrows potentially affects the bias of the estimate but not the variance-For k = 3, efficiency ratio (Design A1 / Design B) = 3-In general, efficiency ratio = (2k) / (k-1)
Multiple direct comparisons between different samples (no common reference)
Different ways of estimating the same contrast:
e.g. A compared to P
Direct = log(A/P)
Indirect = log(A/M) + log((M/P) or
log(A/D) + log(D/P) or
log(A/L) – log((P/L)
How do we combine these?
LL
PPVV
DD
MM
AA
Linear model analysis
Define a matrix X so that E(Y)=Xba = log(A), p=log(P), d=log(D), v=log(V), m=log(M), l=log(L)
lm
lv
ld
lp
la
y
y
y
E
n 00011
00001
11000
2
1
MXXX ˆ 1
Pooled reference
T2 T4 T5 T6 T7T3T1
Ref
Compare to T1
t vs t+3t vs t+2t vs t+1
Time SeriesTime Series
Possible designs:1) All sample vs common pooled reference2) All sample vs time 0 3) Direct hybridization between times.
Design choices in time series t vs t+1 t vs t+2
T1T2 T2T3 T3T4 T1T3 T2T4 T1T4 Ave
N=3 A) T1 as common reference 1 2 2 1 2 1 1.5
B) Direct Hybridization 1 1 1 2 2 3 1.67
N=4 C) Common reference 2 2 2 2 2 2 2
D) T1 as common ref + more .67 .67 1.67 .67 1.67 1 1.06
E) Direct hybridization choice 1 .75 .75 .75 1 1 .75 .83
F) Direct Hybridization choice 2 1 .75 1 .75 .75 .75 .83
T2 T3 T4T1
T2 T3 T4T1
Ref
T2 T3 T4T1
T2 T3 T4T1
T2 T3 T4T1
T2 T3 T4T1
2 by 2 factorial – 2 by 2 factorial – two factors, each with two levelstwo factors, each with two levels
Example 1: Suppose we wish to study the joint effect of two drugs, A and B.
4 possible treatment combinations:– C: No treatment– A: drug A only.– B: drug B only.– A.B: both drug A and B.
Example 2: Our interest in comparing two strain of mice (mutant and wild-type) at two different times, postnatal and adult.
4 possible samples:– C: WT at postnatal– A: WT at adult (effect of time only)– B: MT at postnatal (effect of the mutation only)– A.B : MT at adult (effect of both time and the mutation).
Different ways of estimating parameters.
e.g. B effect.
1 = ( + b) - ()
= b
2 - 5 = (( + a) - ()) -(( + a)-( + b))
= (a) - (a + b)
= b
Factorial designFactorial design
a
b a+b+ab
AC
B AB
1
2
3
4
5
6
Factorial designFactorial design
4
1
2
1
2
1
4
1
2
1
2
1
4
1
2
1
2
1
a
b a+b+ab
AC
B AB
1
2
3
4
5
6
4
1
2
1
2
1
ab
abb
aba
2
12
1
Indirect A balance of direct and indirect
I) II) III) IV)
# Slides N = 6
Main effect A
0.5 0.67 0.5 NA
Main effect B
0.5 0.43 0.5 0.3
Interaction A.B
1.5 0.67 1 0.67
2 x 2 factorial2 x 2 factorial
C
A.BBA
B
C
A.B
A
B
C
A.B
A
B
C
A.B
A
Table entry: variance
Linear model analysis
Define a matrix X so that E(Y)=Xb
Use least squares estimate for a, b, ab
ab
b
a
y
y
y
y
y
y
E
101
011
111
110
001
010
6
5
4
3
2
1
MXXX ˆ 1
Common reference approach
Estimate (ab) with y3 - y2 - y1
y1 = log (A / C) = a
y2 = log (B / C) = b
y3 = log (AB / C) = a + b + ab C
A.BBA
y1 y2 y3
Indirect A balance of direct and indirect
I) II) III) IV)
# Slides N = 6
Main effect A
0.5 0.67 0.5 NA
Main effect B
0.5 0.43 0.5 0.3
Interaction A.B
1.5 0.67 1 0.67
2 x 2 factorial2 x 2 factorial
C
A.BBA
B
C
A.B
A
B
C
A.B
A
B
C
A.B
A
Table entry: variance
More general n by m factorial experimentMore general n by m factorial experiment
2 factors, one with n levels and the other with m levels
OE experiment (2 by 2):
interested in difference between zones, age and also zone.age interaction.
Further experiment (2 by 3):
only interested in genes where difference between treatment and controls changes with time.
0 12 24 0 12 24
treatmentcontrol controltreatment
WT.P11 + a1
MT.P21 + (a1 + a2) + b + (a1 + a2)b
MT.P11 +a1+b+a1.b
WT.P21 + a1 + a2
WT P1
MT.P1 + b
1
2
3
4
5
6
7
ba
ba
b
a
a
m
m
m
m
m
m
m
2
1
2
1
7
6
5
4
3
2
1
11100
10010
00010
01100
01001
00001
00100
Replication Replication
—Why replicate slides:– Provides a better estimate of the log-ratios– Essential to estimate the variance of log-ratios
—Different types of replicates:– Technical replicates
• Within slide vs between slides
– Biological replicates
Technical replication - labellingTechnical replication - labelling
• 3 sets of self – self hybridization: (cerebellum vs cerebellum)
• Data 1 and Data 2 were labeled together and hybridized on two slides separately.
• Data 3 were labeled separately.
Data 1 Data 1
Dat
a 2
Dat
a 3
Technical replication - amplificationTechnical replication - amplificationOlfactory bulb experiment:• 3 sets of Anterior vs Dorsal performed on different days• #10 and #12 were from the same RNA isolation and
amplification• #12 and #18 were from different dissections and amplifications• All 3 data sets were labeled separately before hybridization
amplification
amplification
amplification
amplification
T1 T2
T1
T2
Original samples
Amplified samples
1
2
3
4
Replicate Design 2
Replicate Design 1
1
2
3
4