complex experimental design and simple data analysis: a pharmaceutical example joseph g pigeon...
TRANSCRIPT
Complex Experimental Design and Simple Data Analysis:A Pharmaceutical Example
Joseph G Pigeon
Villanova University
Introduction
• Designs with restricted randomization have multiple error measures
• Pharmaceutical example where the split plot structure is even more complex– Whole plot structure in two dimensions– Correlation structure in two dimensions
• Caveats– Limited understanding of the biology involved– No originality of statistical methods claimed
Split Plot Designs
• Originated in agricultural experiments where– Levels of some factors are applied to whole plots– Levels of other factors are applied to sub plots
• Separate randomizations to whole plots and sub plots– Two types of experimental units– Two types of error measures– Correlation among the observations
Split Plot Designs
• Also common in industrial experiments when– Complete randomization does not occur– Some factor levels may be impractical, inconvenient
or too costly to change
• This restriction on randomization results in some whole plot factors and some sub plot factors
• Data analysis needs to account for this restricted randomization or split plot structure
Split Plot Example
• Consider a paper manufacturer who wants to study– Effects of 3 pulp preparation methods– Effects of 4 temperatures– Response is tensile strength
• Pilot plant is capable of 12 runs per day
• One replicate on each of three days
Split Plot Example
Temperature
Rep Pulp Prep Method
200
225
250
275
1 30 35 37 36 1 2 34 41 38 42 3 29 26 33 36 1 28 32 40 41 2 2 31 36 42 40 3 31 30 32 40 1 31 37 41 40 3 2 35 40 39 44 3 32 34 39 45
Split Plot Example
• Initially, we might consider this to be a 4 x 3 factorial in a randomized block design
• If true, then the order of experimentation within a block should have been completely randomized
• However, this was not feasible; data were not collected this way
Split Plot Example
• Experiment was conducted as follows:– A batch of pulp was produced by one of the three
methods – The batch was divided into four samples – Each sample was cooked at one of the four
temperatures
• Split plot design with– Pulp preparation method as whole plot treatment – Temperature as sub (split) plot treatment
Split Plot Example
Temperature
Rep Pulp Prep Method
200
225
250
275
1 30 35 37 36 1 2 34 41 38 42 3 29 26 33 36 1 28 32 40 41 2 2 31 36 42 40 3 31 30 32 40 1 31 37 41 40 3 2 35 40 39 44 3 32 34 39 45
Split Plot Example Sum of Degrees of Mean
Source of Variation Squares Freedom Square F
Reps (A) 77.55 2 38.78
Prep Method (B) 128.39 2 64.20 7.08
AB (whole plot error) 36.28 4 9.07
Temp (C) 434.08 3 144.69 41.94
AC 20.67 6 3.45
BC 75.17 6 12.53 2.96
ABC (subplot error) 50.83 12 4.24
Total 822.97 35
• Subplot error is less than whole plot error (typical)
Split Plot Example Lessons
• We must carefully consider how the data were collected and incorporate all randomization restrictions into the analysis– Whole plot effects measured against whole
plot error– Sub plot effects measured against sub plot
error
Description of Example – MQPA Assay
• Multivalent Q-PCR based Potency Assay • Used to assign potencies (independently) to
each of five reassortants of a pentavalent vaccine
• Relies on the quantitation of viral nucleic acid generated in 24 hours
• Two major components – Biological component (infection of the standard and
sample viruses)– Biochemical component (quantitative PCR reaction
where PCR = Polymerase Chain Reaction)
Polymerase Chain Reaction (PCR)
Description of Example-Biological Component
• Vero cell maintenance and set up• Serial dilution of known standard and
unknown sample are incubated with trypsin
• Infected in 4 replicate wells of Vero cell monolayers seeded in a 96 well plate
• Infection proceeds for 24 hours and then halted with the addition of a detergent and storage at –70C
Description of Example-Biochemical Component
• Lysate is thawed and diluted • Preparation of a “master mix” • Preparation of Q-PCR plate (master mix +
diluted lysates)• Configuration of the Q-PCR detection system • Potency is determined by parallel line analysis of
standard and test samples• Specific interest is on optimization of the PCR
portion of the assay
PCR Optimization Design
• Discussions with Biologists identified 13 factors – 8 factors associated with preparation of master mix – 5 factors associated with configuration of PCR
detection system (instrument)
• Discussions with Biologists identified 3 responses – Lowest cycle time (range: 1 – 40)– Least variability between replicates– Valid amplification plot (range: 0 – 4)
• Completion of experiments and analysis immediately!
PCR Optimization DesignFactor Current Range Imp
FOR Primer 400 nM 200 – 900 nM 3
REV Primer 400 nM 200 – 900 nM 3
Probe 200 nM 100 – 400 nM 3
DNTP’s 0.30, 0.60 mM 1/2x – 2x 4
MG C12 5.5 mM 3 – 9 mM 1
Tween 0.01% 0.005 – 0.020 % 2
Taq Gold 0.02U/ul 0.01 – 0.04 U/ul 3
MULV Rt 0.25U/ul 0.125 – 0.5 U/ul 5
Annealing Time 1 min 45 – 60 sec 2
Annealing Temp 60 C 55 – 65 C 1
Rt Temp 45 C 40 – 50 C 1
Denaturing Temp 95 C 90 – 97 C 2
Denaturing Time 15 sec 10 – 20 sec 2
PCR Optimization Design Considerations
• Interactions not expected to exist
• Experiments performed in a 96 well plate
• Each plate can accommodate at most 15 master mix combinations– 12 run PB deign for 8 factors
PCR Optimization Design Considerations
• Time constraints imply at most 16 plates (instrument settings)– 25-1 fractional factorial for 5 factors (5 = 1234)
• Concern about using only 12 of 28 combinations – Half of the plates use a 12 run PB design
(123 = 45 = +1) – Half of the plates use the foldover PB design
(123 = 45 = 1)
Plackett-Burman DesignFactors: 8 Replicates: 1 Design: 12Runs: 12 Center pts (total): 0
Data Matrix (randomized)
Run A B C D E F G H 1 - + + + - + + - 2 + + - + - - - + 3 + - + - - - + + 4 - + + - + - - - 5 + + - + + - + - 6 + - + + - + - - 7 - + - - - + + + 8 + - - - + + + - 9 - - + + + - + + 10 - - - - - - - - 11 + + + - + + - + 12 - - - + + + - +
Half Fraction DesignFactors: 5 Base Design: 5, 16 Resolution: VRuns: 16 Replicates: 1 Fraction: 1/2Blocks: none Center pts (total): 0
Design Generators: E = ABCD
Row StdOrder RunOrder A B C D E
1 1 7 -1 -1 -1 -1 1 2 2 8 1 -1 -1 -1 -1 3 3 3 -1 1 -1 -1 -1 4 4 15 1 1 -1 -1 1 5 5 13 -1 -1 1 -1 -1 6 6 9 1 -1 1 -1 1 7 7 10 -1 1 1 -1 1 8 8 6 1 1 1 -1 -1 9 9 16 -1 -1 -1 1 -1 10 10 2 1 -1 -1 1 1 11 11 4 -1 1 -1 1 1 12 12 12 1 1 -1 1 -1 13 13 5 -1 -1 1 1 1 14 14 11 1 -1 1 1 -1 15 15 14 -1 1 1 1 -1 16 16 1 1 1 1 1 1
PCR Optimization Design Layout
• Each represents a 12 run PB design• 16 × 12 = 192 observations
PCR Optimization Design LayoutMaster Mix
1 2 11 12 13 14 23 24
1 X X X X
2 X X X X
Plate
15 X X X X
16 X X X X
PCR Optimization Design LayoutMaster Mix
1 2 11 12 13 14 23 24
1 X X X X
2 X X X X
Plate
15 X X X X
16 X X X X
• Whole plot structure in two dimensions
PCR Optimization Results• Biologists provided this summary of the 21 runs with an
amplification plot rating of 4
Factor Number of –1’s Number of +1’s Sig
FOR Primer 11 10
REV Primer 15 6
Probe 7 14
DNTP’s 16 5
MG C12 7 14
Tween 9 12
Taq Gold 0 21 *
MULV Rt 14 7
Annealing Time 13 8
Annealing Temp 11 10
Rt Time 8 13
Denaturing Temp 19 2 *
Denaturing Time 13 8
PCR Optimization Resultsplate Count mm Count mm1 Count mm2 Count mm3 Count mm4 Count 3 3 5 2 -1 11 -1 16 -1 6 -1 16 4 4 6 3 1 10 1 5 1 15 1 5 5 1 8 5 N= 21 N= 21 N= 21 N= 21 7 2 9 3 10 1 14 2 11 2 19 5 12 3 22 1 14 1 N= 21 15 3 16 1 N= 21
mm5 Count mm6 Count mm7 Count mm8 Count instr1 Count -1 7 -1 9 1 21 -1 14 -1 12 1 14 1 12 N= 21 1 7 1 9 N= 21 N= 21 N= 21 N= 21
instr2 Count instr3 Count instr4 Count instr5 Count -1 10 -1 8 -1 19 -1 13 1 11 1 13 1 2 1 8 N= 21 N= 21 N= 21 N= 21
PCR Optimization Analysis Log
• mm7 = 1; instr4 = –1
PCR Optimization Resultsplate Count mm Count mm1 Count mm2 Count mm3 Count mm4 Count 1 4 1 6 -1 31 -1 26 -1 47 -1 28 2 4 2 6 1 32 1 37 1 16 1 35 4 3 3 3 N= 63 N= 63 N= 63 N= 63 5 5 4 4 6 6 7 7 7 5 11 5 8 8 13 5 9 4 15 2 10 6 16 6 11 3 17 5 12 1 18 3 13 3 20 2 14 5 21 4 15 3 22 3 16 3 23 2 N= 63 N= 63
mm5 Count mm6 Count mm7 Count mm8 Count instr1 Count -1 30 -1 31 -1 41 -1 21 -1 34 1 33 1 32 1 22 1 42 1 29 N= 63 N= 63 N= 63 N= 63 N= 63
instr2 Count instr3 Count instr4 Count instr5 Count -1 31 -1 42 -1 26 -1 29 1 32 1 21 1 37 1 34 N= 63 N= 63 N= 63 N= 63
PCR Optimization Analysis Log
• mm7 = 1; instr4 = –1
• mm3 = 1; mm7 = 1; mm8 = –1; instr3 = 1
PCR Optimization ResultsFractional Factorial Fit: ctgm
Estimated Effects and Coefficients for ctgm (coded units)
Term Effect Coef SE Coef T PConstant 33.919 0.3852 88.06 0.000instr1 -1.264 -0.632 0.3852 -1.64 0.103instr2 0.596 0.298 0.3852 0.77 0.440instr3 -2.157 -1.078 0.3852 -2.80 0.006instr4 1.152 0.576 0.3852 1.50 0.137instr5 0.667 0.333 0.3852 0.87 0.388instr1*instr2 0.892 0.446 0.3852 1.16 0.249instr1*instr3 0.424 0.212 0.3852 0.55 0.582instr1*instr4 -0.221 -0.110 0.3852 -0.29 0.775instr1*instr5 -0.276 -0.138 0.3852 -0.36 0.721instr2*instr3 -1.110 -0.555 0.3852 -1.44 0.151instr2*instr4 0.240 0.120 0.3852 0.31 0.756instr2*instr5 1.522 0.761 0.3852 1.98 0.050instr3*instr4 0.484 0.242 0.3852 0.63 0.531instr3*instr5 0.182 0.091 0.3852 0.24 0.814instr4*instr5 0.027 0.014 0.3852 0.04 0.972
PCR Optimization Results
210-1-2-3
1
0
-1
Standardized Effect
Nor
mal
Sco
re
C
BE
Normal Probability Plot of the Standardized Effects(response is ctgm, Alpha = .05)
A: instr1B: instr2C: instr3D: instr4E: instr5
PCR Optimization Results
210
C
BE
A
D
BC
AB
E
B
CD
AC
AE
BD
AD
CE
DE
Pareto Chart of the Standardized Effects(response is ctgm, Alpha = .05)
A: instr1B: instr2C: instr3D: instr4E: instr5
PCR Optimization Results
instr5instr4instr3instr2instr1
35.0
34.5
34.0
33.5
33.0
ctgm
Main Effects Plot (data means) for ctgm
PCR Optimization Results
-1
1
1 1-1-1
35.2
34.2
33.2
instr5
instr2
Mean
Interaction Plot (data means) for ctgm
PCR Optimization Analysis Log
• mm7 = 1; instr4 = -1
• mm3 = 1; mm7 = 1; mm8 = -1; instr3 = 1
• Instr3 = 1; instr2 and instr5 should have opposite signs?
PCR Optimization Results
420-2-4-6-8
2
1
0
-1
-2
Standardized Effect
Nor
mal
Sco
re
G
C
AFBD
H
Normal Probability Plot of the Standardized Effects(response is ctgm, Alpha = .05)
A: mm1B: mm2C: mm3D: mm4E: mm5F: mm6G: mm7H: mm8
PCR Optimization Results
876543210
GCH
AFBDBGBEAC
BAEADAGBC
EAD
ABBFAH
F
Pareto Chart of the Standardized Effects(response is ctgm, Alpha = .05)
A: mm1B: mm2C: mm3D: mm4E: mm5F: mm6G: mm7H: mm8
PCR Optimization Results
Estimated Effects and Coefficients for ctgm (coded units)
Term Effect Coef SE Coef T PConstant 33.947 0.3206 105.90 0.000mm1 -0.304 -0.152 0.3206 -0.47 0.636mm2 0.699 0.350 0.3206 1.09 0.277mm3 -4.070 -2.035 0.3206 -6.35 0.000mm4 0.222 0.111 0.3206 0.35 0.730mm5 -0.341 -0.171 0.3206 -0.53 0.595mm6 -0.027 -0.013 0.3206 -0.04 0.967mm7 -4.525 -2.263 0.3207 -7.06 0.000mm8 2.061 1.030 0.3206 3.21 0.002
PCR Optimization Results
3210-1-2-3-4-5-6-7
1.5
1.0
0.5
0.0
-0.5
-1.0
-1.5
Standardized Effect
Nor
mal
Sco
re
mm7
mm3
mm8
Normal Probability Plot of the Standardized Effects(response is ctgm, Alpha = .05)
PCR Optimization Results
76543210
mm7
mm3
mm8
mm2
mm5
mm1
mm4
mm6
Pareto Chart of the Standardized Effects(response is ctgm, Alpha = .05)
PCR Optimization Results
mm8mm7mm6mm5mm4mm3mm2mm1
1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1
36
35
34
33
32
ctgm
Main Effects Plot (data means) for ctgm
PCR Optimization Analysis Log
• mm7 = 1; instr4 = – 1
• mm3 = 1; mm7 = 1; mm8 = –1; instr3 = 1
• instr3 = 1; instr2 and instr5 should have opposite signs?
• mm3 = 1; mm7 = 1; mm8 = –1
PCR Optimization Results Row plate mm ct1 ct2 ct3 ct4 ctgm well1 well2
1 3 14 26.88 27.33 27.25 27.13 27.15 37.98 40 2 3 19 27.62 28.10 28.02 27.40 27.78 40.00 40 3 4 5 29.20 29.04 29.39 28.70 29.08 40.00 40 4 11 14 27.53 26.97 28.04 27.90 27.61 40.00 40 5 11 19 28.25 28.57 28.64 28.09 28.39 40.00 40 6 12 5 28.13 28.93 28.39 28.51 28.49 40.00 40
Row amprating mm1 mm2 mm3 mm4 mm5 mm6 mm7 mm8 instr1 instr2
1 4 1 1 1 -1 -1 -1 1 -1 1 1 2 4 -1 -1 1 -1 1 1 1 -1 1 1 3 4 -1 1 1 1 -1 1 1 -1 -1 -1 4 4 1 1 1 -1 -1 -1 1 -1 -1 1 5 4 -1 -1 1 -1 1 1 1 -1 -1 1 6 3 -1 1 1 1 -1 1 1 -1 1 -1
Row instr3 instr4 instr5
1 1 -1 -1 2 1 -1 -1 3 1 -1 -1 4 1 -1 1 5 1 -1 1 6 1 -1 1
PCR Optimization Results
instr4instr3mm8mm7mm3
36
35
34
33
32
ctgm
Main Effects Plot (data means) for ctgm
PCR Optimization Results
38
33
2838
33
2838
33
2838
33
2838
33
28
mm3
mm7
mm8
instr3
instr41
-1
1
-1
1
-1
1
-1
1
-1
Interaction Plot (data means) for ctgm
PCR Optimization Summary
• No complex models – all simple analyses
• 5 factors were found to be significant (mm3, mm7, mm8, instr3 and instr4)
• These factors were further studied using response surface experiments
• Scientists seem quite happy with the results of the PCR optimization experiments
Concluding Remarks
• Many industrial experiments do have a split or strip plot structure which means multiple and possibly complex error measures
• Arises from the conduct of an experiment and/or any restrictions on the randomization
• We need to incorporate these considerations into a proper analysis and interpretation of experimental data
Concluding Remarks
• Experimental designs with balance, symmetry and orthogonality permit simple but effective graphical analyses (even with some missing data)
• Much can be learned from simple analyses following suitable experimental design – All models are wrong, but some models are useful– All models are wrong, but some models are more
wrong than others