parallelism in practice usp bioassay workshop august 2010
DESCRIPTION
Parallelism in practice USP Bioassay Workshop August 2010. Ann Yellowlees Kelly Fleetwood Quantics Consulting Limited. Contents. What is parallelism? Approaches to assessing parallelism Significance Equivalence Experience Discussion. Setting the scene: Relative Potency. RP : - PowerPoint PPT PresentationTRANSCRIPT
Basic Pharmacokinetics
1Parallelism in practice
USP Bioassay Workshop August 2010Ann YellowleesKelly FleetwoodQuantics Consulting Limited
2ContentsWhat is parallelism?Approaches to assessing parallelismSignificanceEquivalenceExperienceDiscussion
3Setting the scene: Relative Potency
RP: ratio of concentrations of reference and sample materials required to achieve the same effect
RP = Cref / Csamp34ParallelismOne curve is a horizontal shift of the other
These are parallel or similar curves
Finney:A prerequisite of all dilution assays
5Real data: continuous response
56Linear model(4 concentrations)
Parallel when the slopes equalLinear: Y = a + log (C)NB the range - which concentrations?Do we care about the asymptotes?7Four parameter logisticmodel
4PL: Y = + ( - ) / [1 + exp ( log (C) )]Parallel when asymptotes , slope equalMention symmetry.
A looks the sameB does not look the same78Five parameter logisticmodel
Parallel when asymptotes , slope asymmetry equal
5PL: Y = + ( - ) / [1 + exp ( log (C) ) ]A: sameB: not the same.Slope not the same9Tests for parallelismApproach 1Is there evidence that the reference and test curves ARE NOT parallel?
Compare unrestricted vs restricted models Test loss of fit when model restricted to parallelp value approaches
Traditional F test approach as preferred by European Pharmacopoeia
Chi-squared test approach as recommended by Gottschalk & Dunn (2005)
910Approach 2Is there evidence that the reference and test curves ARE parallel?
Equivalence test approach as recommended in the draft USP guidance (Hauck et al 2005)Fit model allowing non-parallel curves Confidence intervals on differences between parameters
Pharmacopoeial disharmony exists!! (existed?)
1011In practice...
Four example data sets
Data set 1: 60 RP assays (96 well plates, OD: continuous)Data set 2: 15 RP assays (96 well plates, OD : continuous)Data set 3: 12 RP assays (96 well plates, OD : continuous)
Data set 4: 60 RP assays (in vivo, survival at day x: binary*)
* treated as such for this purpose; wasteful of data
1112In practice...
We have applied the proposed methods in the context of individual assay pass/fail (suitability):
Data set 1Compare 2 significance approachesCompare equivalence with significanceData sets 2, 3Compare 2 significance approachesData set 4 Compare F test (EP) with equivalence (USP)1213Data set 160 RP assays
8 dilutions 2 independent wells per dilution
4PL a good fit(vs 5PL)
NB precision
Model log e OD s log e conc AVERAGE SLOPE = 1/.384 = 2.6GD_RegressionGraph_4PL_WEIGHTED_077wmf1314Data set 1: F test and chi-squared testF test: straightforward
Chi-squared test: need to establish mean-variance relationship
This is a data driven method!!! Very arbitrary
Establishing equivalence limitsHauck paper: provisional capability based limits can be set using reference vs reference assaysNot available in our dataset...
1415Data set 1: F test and chi-squared test
F test: 12/60 = 20% of assays have p < 0.05Evidence of dissimilarity? OR Precise assay?
Chi-squared test: 58/60 = 97% of assays have p < 0.05!Intra-assay variability is low differences between parallel and non-parallel model are exaggerated
Histograms of F-test p-values and G&D p-valuesFollowed by example graph to illustrate why G&D behaves so poorly:Intra-assay is variability is low, compared to quality of the fit, differences between curves exaggerated. Poor choice of statistic
This is a data driven method!!! Very arbitrary
Establishing equivalence limitsHauck paper: provisional capability based limits can be set using reference vs reference assaysNot available in our dataset...
1516Data set 1: Comparison of approaches to parallelism
16Data set 1: Comparison of approaches to parallelismSome evidence of hook in modelResidual SS inflated
17
NOTE HOOK1718Data set 1: Comparison of approaches to parallelism
Excluding top 2 points because of HOOK Approx 20 /60 passRemodelled : quadratic relationship re fitted18
19Data set 1: F test and chi-squared test
RSSparallel = 159RSSnon-parallel = 112RSSp RSSnp = 47
Pr(23>47) < 0.01
F test: P = 0.03Example where both fail19
20Data set 1: F test and chi-squared test
RSSparallel = 100.2RSSnon-parallel = 99.0RSSp RSSnp = 1.2 Pr(23>1.2) = 0.75Example where both PASS20Data set 1: USP methodologyProve parallelLower asymptote:
2122Data set 1: USP methodologyUpper asymptote:
This is interesting: demonstrates that its not enough just to order the data and take the 2nd from the end as your limit. Need to examine it. Check for bias!2223Data set 1: USP methodologyScale:
Scale for reference: 0.384 (range 0.344 to 0.416)
NB scale = 1/ slope2324Data set 1: USP methodologyCriteria for 90% CI on difference between parameter values:
Lower asymptotes: (-0.235, 0.235)
Upper asymptotes: (-0.213, 0.213)
Scales: (-0.187, 0.187)
Applying the criteria:3/60 = 5% of assays fail the parallelism criteriaNo assay fails more than one criterion
scale parameter from R parameterisation: allows log RP to be estimated as a1 a2 (easy variance)24
25Data set 1: Comparison of approaches to parallelism
25
Data set 1: Comparison of approaches to parallelismThis plate fails all 3 testsUSP: Lower asymptote
26
FAILS ALL whether or not hook included2627Data set 1: Comparison of approaches to parallelism
Equivalence test: scales not equivalentF test p-value = 0.60Chi-squared test p-value < 0.001
F test passes: high variability2728Data set 2: Comparison of approaches to parallelism
Constant variance2829Data set 3: Comparison of approaches to parallelism
Linear fit for mean varianceAgain the G&D test suggests more assays FAIL2930In practice...
Data set 4: Compare F test with equivalenceMethodology for Chi-squared test not developed for binary data
3031Data set 460 RP assays
4 dilutions15 animals per dilution
Actual model is a GLM (i.e. response 0,1 dependent on survival), % Survival shown for illustrative purposes only;.SLOPES: average = -2.41. range (-14.71, -1.03)
3132Data set 4: Comparison of approaches to parallelism
F test: 5/60 = 8% failEquivalence: Fail 5% = 3Equivalence: could choose limit to match3233Data set 4: Comparison of approaches to parallelism
F-test approach and Equivalence approach could be in agreement depending on how limits are set.3334Broadly...
F test Fail (?wrongly?) when very precise assay Pass (?wrongly?) when noisy Linear case: p value can be adjusted to match equivalenceChi-squaredFail when very precise assay (even if difference is small)If model fits badly weighting inflates RSS (e.g. hook)2 further data sets supported thisUSPLimits are set such that the extreme 5% will failThey do! Regardless of precision, model fit etc3435Stepping back
What are we trying to do?
Produce a biologic to a controlled standard that can be used in clinical practice
For a batch we need to know its potencyWith appropriate precisionIn order to calculate clinical dosePerhaps add more information about precision to this3536Some thoughts
Establish a valid assayUse all development assay results unless a physical reason exists to exclude them Statistical methodology can be used to flag possible outliers for investigationUSP applies this to individual data points
Parallelism / similarityAre the parameter differences fundamentally zero?Or is there a consistent slope difference (e.g)?Equivalence approach + judgment for acceptable marginPerhaps add more information about precision to this3637Some thoughts
2. Set number of replicates to provide required precisionCombine RP values plus confidence intervals for reportable value
Per assay, use all results unless physical reason not to (They are part of the continuum of assays)Flag for investigation using statistical techniquesReference behaviourParallelism
4. Monitor performance over time (SPC)Reference stabilityParallelismPerhaps add more information about precision to this3738Which parallelism test?Our view:Chi squared test requires too many complex decisions and is very sensitive to the model
F test not generally applicable to the assay validation stageDoes not allow examination of the individual parametersDoes not lend itself to judgment about How parallel is parallel?
The equivalence test approach fits in all three contextsWith adjustment of the tolerance limits as appropriate
39Thank youUSP the invitation
Clients use of dataBioOutsource: www.biooutsource.comOther clients who prefer to remain anonymous
Quantics staff analysis and graphicsKelly Fleetwood (R), Catriona Keerie (SAS)
3940