research openaccess robustdose-responsecurveestimation … · curve fitting. one of the widely used...

Nguyen et al Source Code for Biology andMedicine 2014 927httpwwwscfbmorgcontent9127

RESEARCH Open Access

Robust dose-response curve estimationapplied to high content screening data analysisThuy Tuong Nguyen2 Kyungmin Song5 Yury Tsoy1 Jin Yeop Kim1 Yong-Jun Kwon3 Myungjoo Kang5

and Michael Adsetts Edberg Hansen4

Abstract

Background andmethod Successfully automated sigmoidal curve fitting is highly challenging when applied tolarge data sets In this paper we describe a robust algorithm for fitting sigmoid dose-response curves by estimatingfour parameters (floor window shift and slope) together with the detection of outliers We propose twoimprovements over current methods for curve fitting The first one is the detection of outliers which is performedduring the initialization step with correspondent adjustments of the derivative and error estimation functions Thesecond aspect is the enhancement of the weighting quality of data points using mean calculation in Tukeys biweightfunction

Results and conclusion Automatic curve fitting of 19236 dose-response experiments shows that our proposedmethod outperforms the current fitting methods provided by MATLABsnlinfit function and GraphPads Prismsoftware

Keywords High content screening Dose response curve Curve fitting Sigmoidal function Weighting functionOutlier detection

IntroductionIn recent years the need for automatic data analysis hasbeen amplified by the use of high content screening (HCS)[1] techniques HCS (also known as phenotypic screening)is a great tool to identify small molecules that alter the dis-ease state of a cell based on measuring cellular phenotypeHowever HCS always comes with an important caveatthere is little or no a priori information on the com-pounds target At the Institut Pasteur Korea (IP-K) wehave applied high content screening using small interfer-ing RNA (siRNA) [23] at the genome scale More recentlywe applied the siRNA knockdown strategy for each genefrom the genome and studied their effect on a givenmolecule producing a clear response on a cellular assayto look for synergestic effects of a drug and target Per-forming large-scale curve fitting for the biological activityof a large compound library can lead to a great number of

Correspondence thnguyenucdavisedumichaeladsettsedberghansengmailcomEqual contributors2University of California Davis USA4Videometer AS Horsholm DenmarkFull list of author information is available at the end of the article

dose-response curves (DRCs) that require proper fittingThe overall process of fitting rejecting outliers and datapoint weighting of thousands to millions of curves canbe a significant challenge The quality of a dose-responsecurve fitting algorithm is a key element that should help inreducing false positive hits and increasing real compoundhitsSeveral solutions [4-9] have been proposed to perform

curve fitting One of the widely used nonlinear curve fit-ting algorithms was introduced by Levenberg-Marquardt(LM) [56] This method belongs to the gradient-descentfamily However due to the sensitivity of the method thedata quality and the initial guess the curve fitting algo-rithm is unfortunately susceptible to becoming trapped inlocal minimums To obtain proper curve fitting resultsthere is a need for automatic outlier handling usage ofpredefined initial curves and adaptive weighting for datapoints [1011] All of these demands can without doubtbe applicable to large sets of data Nonlinear and non-iterative least square regression analysis was presentedin [7] for robust logistic curve fitting with detection ofpossible outliers This non-iterative algorithm was imple-mented in a microcomputer and assessed using different

2014 Nguyen et al licensee BioMed Central This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (httpcreativecommonsorglicensesby40) which permits unrestricted use distribution andreproduction in any medium provided the original work is properly credited The Creative Commons Public Domain Dedicationwaiver (httpcreativecommonsorgpublicdomainzero10) applies to the data made available in this article unless otherwisestated

Nguyen et al Source Code for Biology andMedicine 2014 927 Page 2 of 9httpwwwscfbmorgcontent9127

biological and medical data A review of popular fittingmodels using linear and nonlinear regression is givenin [4] The study serves as a practical guide to helpresearchers who are not statisticians understand statisticaltools more clearly especially dose-response curve fittingusing nonlinear regression This book also describes therobust fitting algorithm and the outlier detection mecha-nism used in the commercial software GraphPad PrismIn [8] an automatic best-of-fit estimation procedure isintroduced based on the Akaike information criterionThis best-of-fit model guarantees not only the estimationof the parameters with smallest sum-of-squares errors butalso good prediction of the modelSimulated data for DRC estimation was used in [9] to

examine the performance of the proposed Grid algorithmThe peculiarity of the Grid algorithm is that it visits allthe points in a grid of four curve parameters and searchesfor the point with the optimum sum-of-squares error Acoarse-to-fine grid model and a threshold-based outlierdetection mechanism were used to make the algorithmmore efficient This paper also provided Java-based soft-ware together with a sample dataset for academic useHowever the software is applicable only when the mea-surements are taken without replicates (one data point ateach concentration) Furthermore various popular com-puter software and code packages have been presentedfor DRC fitting such as the DRC package in R [12] thenlinfit function in MATLAB [13] and the XLfit add-in for Microsoft Excel [14]In this paper robust fitting and automatic outlier detec-

tion based on Tukeys biweight function are introducedThis method was developed to automate nonlinear fit-ting of thousands of DRCs performed in replicates detectthe outliers automatically and initialize the fitting curvesrobustly

Background andmethodBackgroundIn drug discovery analysis of the dose-response curve(DRC) is one of the most important tools for evaluat-ing the effect of a drug on a disease The DRC can beused to plot the results of many types of assays its X-axis corresponds to the concentrations of a drug (in logscale) and its Y -axis corresponds to the drug responsesThe function of DRC can be varied with different numberof parameters but the four parameter model is the mostcommon

f (xβ) = β1 + β2

1 + exp( β3 x

β4

) (1)

where x is the dose or concentration of a data point β

represents the four parameters β1 β2 β3 and β4 β1 is

the floor - the efficacy - which shows the biological activ-ity without a chemical compound β2 is the window - theefficacy - which shows the maximum saturated activity athigh concentration β3 is the shift - the potency - of theDRC and β4 is the slope - the kinetics Figure 1 shows anillustration of a response curve and its four parameters

Basic computation of curve fittingThe goal of a curve fitting algorithm is to solve a statisti-cally optimized model that best fits the data set Becausethe DRC function is nonlinear an iterative method is con-sidered to optimize parameters In this section a basicviewpoint is presented to approach the proposed ideasin our method First let χ be the function of the fit-ting parameter β which will be determined via functionminimization

χ(β) =Nsumi

ρ

(yi f (xiβ)

σi

) (2)

where ρ(z) is an error estimation function of a single vari-able z σ is a normalization value and N is the number ofdata points To minimize (2) the Newton method is usedto search for zero crossing of the gradient

βt+1 = βt + D 1[ nabla χ(βt)] (3)

Hence we need to find the gradient and the Hessianmatrix D of χ The gradient of χ with respect to β =β1β2β3β4 is computed as

partχ

partβk=

Nsumi

1σi

ψ(z)partf (xiβ)

partβk (4)

Figure 1 A four-parameter dose-response curve β1β2β3 and β4

are the floor the window the shift and the slope respectively


whereψ(z) is the derivative function of ρ(z)with z = (yif (xiβ))σi The Hessian matrix is calculated using

part2χ

partβkpartβl=

Nsumi

1σ 2i

ψ prime(z) partf (xiβ)

partβk

partf (xiβ)

partβl 1

σiψ(z)

part2f (xiβ)

partβkpartβl

(5)

where the second derivative term is negligible comparedto the first derivative Let

akl = part2χ

partβkpartβland bk = partχ

partβk (6)

where akl and bk are elements of matrices A and Brespectively then instead of directly inverting the Hessianmatrix (3) can be rewritten as a set of linear equations

4suml=1

aklδβl = bk (7)

where δβl is changed at every iteration Afterwards tosolve our fitting problem we define

ρ(z) = 12z2 and ψ(z) = z (8)

and obtain akl and bk using least squares The Levenberg-Marquardt (LM) method [5615] solves (7) by defining apositive value λ to control the diagonal of the matrix A

aprimekk = akk(1 + λ) and aprime

kl = akl (k = l) (9)

The iteration steps of the LM method can be summa-rized as follows

1 Evaluate χ(β) and define a modest value for λ ieλ = 0001

2 Solve (7) with A substituted by Aprime in (9) and evaluateχ(β + δβ)

3 If χ(β +δβ) χ(β) increase λ by a factor of 10 elsedecrease λ by a factor of 10 and update β larr β + δβ

4 Repeat steps 2 and 3 until χ(β) converges and thereturn β

Outlier detectionIn most cases it is difficult to estimate the parameterseither due to noise in the observations or because theexperimental design might give rise to ambiguities in theparameters of the DRC There is a need for an outlierdetection mechanism to cope with noise before fittingcurves Figure 2 shows the effect of outliers in the dataThere are eight different concentrations five replicates atthe first concentration and three replicates at the remain-ing concentrations It is likely to become noisy when thenumber of data points increases Seven outliers (the solidarrows show these outliers in the figure) can change thefit of the curve dramatically These seven points in the leftfigure have lower weights (outliers detected) than those inthe right figure

In our problem we initialize the fitting parameters βo(disregarding outliers) by finding the best fitted curveusing

ρ(z) = |z| and ψ(z) = sign(z) (10)

instead of (8) which is used without outlier detectionHerein we proposed (10) to reduce the impact of out-liers on the fitting results by considering absolute errorsand sign-only derivatives The key difference between (10)and (8) is that the derivative function ψ(z) = sign(z)results in 1 (negative) or 1 (positive) which controls thegradient in the direction of having a higher number ofnegativespositives whereas ψ(z) = z judges the gradientbased on the distance between the estimated and actualvalues Therefore (10) is able to disregard the points hav-ing a lower number of negativespositives (see Figure 2on the left) and (8) considers the points at far distancesalthough these points are given low impact (see Figure 2on the right) Levenberg-Marquardt and other conven-tional nonlinear curve fitting algorithms are based onderivative calculation and the quality of their solutionsnotably depends on data quality (ie outliers) and the ini-tial guess To have good fitting outliers and the initialguess have to be manually detected and defined Accord-ingly these conventional algorithms are very difficult toautomate and be made to yield good solutions in thou-sands of DRCs Based on (10) outliers can be effectivelyweighted and a robust initial guess is automatically deter-mined at the beginning of the fitting process The methodof weighting data points is described in the next section

Curve fitting with weighting functionBecause all fitting parameters of the DRC are very impor-tant in understanding and assessing the effect of a chem-ical compound it is essential to have a method that canestimate the curve in a robust way ie coping with out-liers and assigning weights to data points Initially all datapoints are supposed to have equal weights However thisidea does not hold in many practical occasions Thereforea least squares method tends to give unequal weighting todata points eg points that are closer to the fitted curvewould have higher weighting values In standard weight-ing minimizing the fitting error (sum-of-squares) of theabsolute vertical distances is not appropriate points hav-ing high response values tend to have large deviationsfrom the curve and so they contribute more to the sum-of-squares value This weighting makes sense when thescattering of data is Gaussian and the standard devia-tion among replicates is approximately the same at eachconcentrationTo overcome the situation in which data spreads differ-

ently at concentrations various weighting techniques are


Figure 2 Influence of noise (the arrows show the outliers) Fitting on the left assigns low weights to the outliers to disregard them Fitting onthe right considers the outliers as useful data points and gives higher weights to these points

considered including relative weighting Poisson weight-ing and observed variability-based weighting [4] Rela-tive weighting extends the idea of standard weighting bydividing the squared distance by the square of the corre-sponding response value Y hence the relative variabilityis consistent Similarly Poisson weighting and weightingby observed variability use different forms of dividing theresponse value Y Indeed minimizing the sum-of-squaresmight yield the best possible curve when all variationsobey a Gaussian distribution (without considering howdifferent the standard deviations at concentrations are)However it is common for one data point to be far fromthe rest (caused by experimental mistakes) then thispoint does not belong to the sameGaussian distribution asthe remaining points and it contributes erroneous impactto the fitting The Tukey biweight function [10] was intro-duced to reduce the effect of outliers This weightingfunction considers large residuals and treats them withlowweights or even zero weights so that they do not swaythe fitting much In this section we present a modificationof the Tukey function and apply it to our fittingLet ω(r) be the weight of a data point that has a distance

to the curve (residual) of r then the biweight function isdefined as

ω(r) =

⎧⎪⎨⎪⎩

[1

( rc

)2]2 |r| lt c

0 |r| gt c(11)

where c = 6 median(riNi=1

) 6 is a constant defined

by Tukey and N is the number of data points This func-tion totally ignores or gives zero weighting to the pointshaving residuals larger than six times the median resid-ual Nevertheless when the experimental data containsa great deal of noise (which usually occurs in biological

and medical assays) it can fall on a normal distributioneasier than when they contain little noise When our dataapproaches a normal distribution the mean of residu-als is a better choice than the median of residuals (themedian is useful if the data has extreme scores) In ourcase based on experimental situations we decided touse c = 6 mean

(riNi=1) = 6r Figure 3 shows the

fitting results of using themedian andmean The sum-of-squares error produced by using the mean is significantlybetter than by using themedian Additionally the curve fitby using the mean looks more satisfactory than by usingthe median Hence the curve fitting algorithm with themodified Tukey biweight function can be summarized asfollows

1 Determine the distance from each data point to thecurve called the residual r

2 Calculate the weight of each point using (11) with rapplied

3 Assign new values to data points based on theirweights

In addition other robust weighting functions can beconsidered such as Andrews bisquare Cauchy FairHuber logistic Talwar and Welsch [16] however theTukey biweight function is known for its reliability andit is generally recommended for robust optimization [17]As we show further the use of the Tukey biweight allowsfor improved DRC fitting results (see the Results section)

Robust univariate DRC estimation algorithmThe proposed algorithm is conceptually easy to imple-ment and robust to outliers Combining ideas from theprevious sections the algorithm consists of the followingsteps


Figure 3 Results of using the median (left) andmean (right) calculations in the Tukey biweight function

1 Find the initial curve with outlier detection byexecuting the LM algorithm and applying (10)instead of (8)

2 Based on the curve obtained calculate the Tukeybiweights of data points using (11)

3 Based on the obtained weights execute the LMalgorithm in the basic computation section andconsider the weights of the data points

4 Repeat steps 2 and 3 for a predefined number ofiterations or until convergence

Figure 4 Two results of no outliers and good fitting


ResultsWe have previously shown the ability to use the highcontent genome-wide silencing RNA (siRNA) screeningapproach on various cellular models [23] The recentcombination of drugs (at different concentrations) andsiRNA (approximately 20000) lead us to look for anautomatic method to characterize a large number of dose-response curves obtained in those experimental condi-tions Using MATLAB a curve fitting algorithm wasimplemented We compared the fitting performance ofour method to that of the nlinfit function in MAT-LAB 2013a and the robust DRC fitting in GraphPadPrism 60In our experiments we used eight different concen-

trations 0 0005 001 01 04 1 5 and 20 M withfive replicates at concentration 0 and three replicates atthe other concentrations (26 data points in total) In thismanuscript the concentrations were plotted over the X-axis in log unit The Y -axis shows the drug responsewhich was normalized in the range of 0 to 100 The ini-tial values of the parameters floor window shift and slopeused in the MATLABnlinfit function were definedbased on values of data points as min(Y ) max(Y )

min(Y ) average(X) and 06 respectively By defaultthe algorithm uses bisquare (also known as Tukeybiweight) as the robust weighting function Indeed MAT-LAB applied the Levenberg-Marquardt (LM) algorithmand iterative reweighted least squares [16] for robust esti-mation Accordingly the nlinfit represents the case ofusing the traditional LM algorithm and Tukey biweightfunction where (8) and the median function are utilizedFor our algorithm we defined the initial DRC parametersin the first step (outlier detection) as in nlinfit andthen those parameters were corrected using the outlierdetection step and used for the consequent fitting stepsAccording to step 4 of our DRC estimation algorithm thenumber of iterations was predefined as 50 and the errortolerance for convergence was 00001Figure 4 shows two examples of fitting when data points

do not include outliers and the results of three fitting algo-rithms are acceptable Curve fitting results were presentedtogether with sum-of-squares errors (on the top-left cor-ner) which are calculated by taking the sum-of-squaresof differences between the actual Y and the estimated YSmaller errors indicate better results The plotting resultsare shown from left to right MATLABnlinfit Prism

Figure 5 Two results of outliers


Figure 6 Two results of outliers and bad fitting

robust fitting and our method respectively Figure 5 illus-trates the cases of the presence of outliers For pointsinside the interval from 65 to 5 (log unit) the varia-tion of measurements is high In this figure the first resultof MATLABnlinfit demonstrate an ambiguity of theshift parameter the log of IC50 should be shifted to theright to cross the mean point in the middle of the plotThe first plot of Prism presents a poor DRC due to thehigh steep slope Figure 6 displays the cases where out-liers appear and lead to bad fitting Prism was completely

unable to fit the first curve but our method handledthe data points very well Additionally the second plotof MATLABnlinfit shows an ambiguity of the shiftparameter and a high steep slopeIn drug discovery and genome-wide data analysis curve

parameters especially the shift act as a crucial factor indetermining the target candidates Therefore poor out-comes of the DRC fitting algorithm might greatly affectthe analysis of the whole genome which leads to dif-ficulty finding the targets To evaluate the performance

Table 1 Averages and standard deviations of the normalized sum-of-squares errors calculated based on the fittingresults of 19236 curves

SlideNumber of curves Matlab Prism Ours (median) Ours (mean)

σ σ σ σ

1 3885 0920 0134 0777 0213 0855 0210 0777 0217

2 3887 0907 0117 0852 0154 0943 0134 0817 0165

3 3865 0916 0112 0844 0161 0922 0155 0820 0170

4 3875 0946 0105 0793 0198 0841 0202 0777 0207

5 3724 0908 0144 0722 0234 0820 0246 0755 0239

Average 0919 0122 0798 0192 0876 0189 0789 0200

Boldface numbers indicate the best errors


of different DRC fitting algorithms on a large-scale weassessed 19236 curves which were obtained from fivemicroarray slides Table 1 shows fitting errors for the fiveslides with the corresponding number of curves 38853887 3865 3875 and 3724 from each slide Averages andstandard deviations of the normalized sum-of-squareserrors (the normalized error is calculated by dividing theactual error by the maximum one) were also includedin this table with the comparison of the four methodsHerein in addition to comparing our method (using meancalculation in the Tukey biweight function) to MATLABand Prism we also compared to our method which usesthe median for calculation of (11) to prove that the modi-fied Tukey biweight function can significantly improve thefitting Our method mostly yielded the best average errorsin all slides whereas MATLABnlinfit was the worstPrism gave better results than the method using mediancalculationIn summary experimental comparisons show that our

method (namely Ours (mean) in the figure) which pro-poses automatic initialization of DRC parameters andmodification of the Tukey biweight function (mean calcu-lation) yields a satisfactory fitting of curves It providesmore accurate fitting than MATLAB nlinfit whereautomatic initialization is not available and the defaultTukey biweight function (median calculation) is usedby more than 14 in processing 19236 curves We alsodemonstrated that the method applying the automatic ini-tialization and the default Tukey function (namely Ours(median)) did not yield results as good as those ofOurs (mean) Moreover the better performance of Ours(median) than that of MATLABnlinfit implies thatthe automatic initialization of DRC parameters meaning-fully improves the fitting process Our method is supe-rior to GraphPad Prism 60 by more than 1 Althoughthe error is slightly improved we believe that with theMATLAB implementation provided our approach is eas-ily automated and scalable to thousands of curves It isable to process the entire genome data in less than twohours approximately 360 milliseconds for a curve (witha computer configuration using an Intel Core i7 347 GHzCPU)

ConclusionWe provide two improvements for the problem of DRCfitting 1) increasing the accuracy of the initialization ofDRC parameters with the use of outlier detection and 2)improving the method of weighting for noisy data in theTukey biweight function Our method is adapted to theanalysis of thousands of DRCs or more with the use ofautomatic outlier detection and initialization of curves Byexperimentally comparing the results of our method tothose calculated by the nlinfit function in MATLAB2013a and the robust DRC fitting in GraphPad Prism 60

we found that the proposed approach yielded a superiorestimation of curves to that of MATLAB and Prism

Competing interestsThe authors declare that they have no competing interests

Authors contributionsContributed idea MAEH KS MK Wrote manuscript TTN KS YT Implementedalgorithm TTN KS Did data analysis TTN MAEH YJK Designed and didbiological experiments JYK YJK All authors read and approved the finalmanuscript

AcknowledgementsThis work was supported by the National Research Foundation of Korea (NRF)grant funded by the Korea government (MEST) (No 2012-00011)Gyeonggi-do and KISTI Kyungmin Song and Myungjoo Kang were supportedby the National Research Foundation of Korea (NRF) grant funded by theKorea government (MEST) (2013-025173) The funders had no role in studydesign data collection and analysis decision to publish or preparation of themanuscript

Author details1Institut Pasteur Korea Seongnam-si Gyeonggi-do South Korea 2University ofCalifornia Davis USA 3Samsung Medical Center Seoul South Korea4Videometer AS Horsholm Denmark 5Seoul National University SeoulSouth Korea

Received 14 July 2014 Accepted 14 November 2014

References

1 Joseph MZ Applications of high content screening in life scienceresearch Comb ChemHigh Throughput Screen 2009 12(9)870876

2 Siqueira-Neto JL Moon SH Jang JY Yang GS Lee CB Moon HK ChatelainE Genovesio A Cechetto J Freitas-Junior LH An image-basedhigh-content screening assay for compounds targeting intracellularLeishmania donovani amastigotes in humanmacrophages PLOSNeglect Trop D 2012 6(6)e1671

3 Genovesio A Kwon YJ Windisch MP Kim NY Choi SY Kim HC Jung SYMammano F Perrin V Boese AS Casartelli N Schwartz O Nehrbass UEmans N Automated genome-wide visual profiling of cellularproteins involved in HIV infection J Biomol Screen 2011 16(9)945958

4 Motulsky H Christopoulos A Fitting Models to Biological Data Using Linearand Nonlinear Regression a Practical Guide to Curve Fitting New York USAOxford University Press Inc 2004

5 Levenberg K Amethod for the solution of certain problems in leastsquares Quart Applied Math 1944 2164168

6 Marquardt D An algorithm for least-squares estimation of nonlinearparameters SIAM J Applied Math 1963 11(2)431441

7 Ayiomamitis A Logistic curve fitting and parameter estimation usingnonlinear noniterative least-squares regression analysis ComputBiomed Res 1986 19(2)142150

8 Rey D Automatic best of fit estimation of dose response curveKonferenz der SAS-Anwender in Forschung und Entwicklung 2007[httpwwwuni-ulmdeksfe2007]

9 Wang Y Jadhav A Southal N Huang R Nguyen DT A Grid algorithm forhigh throughput fitting of dose-response curve data Curr ChemGenomics 2010 45766

10 Hoaglin D Mosteller F Tukey J Understanding Robust and Exploratory DataAnalysis New York USA John Wiley and Sons Inc 1983

11 Boyd Y Vandenberghe L Convex Optimization Cambridge CambridgeUniversity Press 2009

12 Ritz C Streibig JC Bioassay analysis using R J Stat Soft 2005 12(5)12213 MathWorks Matlab 2013 nlinfit Nonlinear regression [httpwww

mathworkscomhelpstatsnlinfithtml]14 IDBS 2013 XLfit for curve fitting and data analysis [httpwww

excelcurvefittingcom]


15 Press WH Teukolsky SA Vetterling WT Flannery BP Numerical Recipes in Cthe Art of Scientific Computing (3rd edition) New York USA CambridgeUniversity Press 2007

16 Holland PW Welsch RE Robust regression using iterativelyreweighted least-squares Comm Stat Theor Meth 1977 A6813827

17 Maronna R Martin RD Yohai V Robust Statistics Theory andMethodsChichester England Wiley 2006

doi101186s13029-014-0027-xCite this article as Nguyen et al Robust dose-response curve estimationapplied to high content screening data analysis Source Code for Biology andMedicine 2014 927

Submit your next manuscript to BioMed Centraland take full advantage of

Convenient online submission

Thorough peer review

No space constraints or color gure charges

Immediate publication on acceptance

Inclusion in PubMed CAS Scopus and Google Scholar

Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Abstract

Background and method

Results and conclusion

Keywords

Introduction


Background

Basic computation of curve fitting

Outlier detection

Curve fitting with weighting function

Robust univariate DRC estimation algorithm

Results

Conclusion

Competing interests

Authors contributions

Acknowledgements

Author details

References

Nguyen et al Source Code for Biology andMedicine 2014 927 of 9httpwwwscfbmorgcontent9127

biological and medical data A review of popular fittingmodels using linear and nonlinear regression is givenin [4] The study serves as a practical guide to helpresearchers who are not statisticians understand statisticaltools more clearly especially dose-response curve fittingusing nonlinear regression This book also describes therobust fitting algorithm and the outlier detection mecha-nism used in the commercial software GraphPad PrismIn [8] an automatic best-of-fit estimation procedure isintroduced based on the Akaike information criterionThis best-of-fit model guarantees not only the estimationof the parameters with smallest sum-of-squares errors butalso good prediction of the modelSimulated data for DRC estimation was used in [9] to

examine the performance of the proposed Grid algorithmThe peculiarity of the Grid algorithm is that it visits allthe points in a grid of four curve parameters and searchesfor the point with the optimum sum-of-squares error Acoarse-to-fine grid model and a threshold-based outlierdetection mechanism were used to make the algorithmmore efficient This paper also provided Java-based soft-ware together with a sample dataset for academic useHowever the software is applicable only when the mea-surements are taken without replicates (one data point ateach concentration) Furthermore various popular com-puter software and code packages have been presentedfor DRC fitting such as the DRC package in R [12] thenlinfit function in MATLAB [13] and the XLfit add-in for Microsoft Excel [14]In this paper robust fitting and automatic outlier detec-

tion based on Tukeys biweight function are introducedThis method was developed to automate nonlinear fit-ting of thousands of DRCs performed in replicates detectthe outliers automatically and initialize the fitting curvesrobustly

Background andmethodBackgroundIn drug discovery analysis of the dose-response curve(DRC) is one of the most important tools for evaluat-ing the effect of a drug on a disease The DRC can beused to plot the results of many types of assays its X-axis corresponds to the concentrations of a drug (in logscale) and its Y -axis corresponds to the drug responsesThe function of DRC can be varied with different numberof parameters but the four parameter model is the mostcommon

f (xβ) = β1 + β2

1 + exp( β3 x

β4

) (1)

where x is the dose or concentration of a data point β

represents the four parameters β1 β2 β3 and β4 β1 is

the floor - the efficacy - which shows the biological activ-ity without a chemical compound β2 is the window - theefficacy - which shows the maximum saturated activity athigh concentration β3 is the shift - the potency - of theDRC and β4 is the slope - the kinetics Figure 1 shows anillustration of a response curve and its four parameters

Basic computation of curve fittingThe goal of a curve fitting algorithm is to solve a statisti-cally optimized model that best fits the data set Becausethe DRC function is nonlinear an iterative method is con-sidered to optimize parameters In this section a basicviewpoint is presented to approach the proposed ideasin our method First let χ be the function of the fit-ting parameter β which will be determined via functionminimization

χ(β) =Nsumi

ρ

(yi f (xiβ)

σi

) (2)

where ρ(z) is an error estimation function of a single vari-able z σ is a normalization value and N is the number ofdata points To minimize (2) the Newton method is usedto search for zero crossing of the gradient

βt+1 = βt + D 1[ nabla χ(βt)] (3)

Hence we need to find the gradient and the Hessianmatrix D of χ The gradient of χ with respect to β =β1β2β3β4 is computed as

partχ

partβk=

Nsumi

1σi

ψ(z)partf (xiβ)

partβk (4)

Figure 1 A four-parameter dose-response curve β1β2β3 and β4

are the floor the window the shift and the slope respectively



part2χ

partβkpartβl=

Nsumi

1σ 2i


partβk

partf (xiβ)

partβl 1

σiψ(z)

part2f (xiβ)

partβkpartβl

(5)


akl = part2χ


partβk (6)


4suml=1

aklδβl = bk (7)


ρ(z) = 12z2 and ψ(z) = z (8)



kl = akl (k = l) (9)








ρ(z) = |z| and ψ(z) = sign(z) (10)








ω(r) =

⎧⎪⎨⎪⎩

[1

( rc

)2]2 |r| lt c

0 |r| gt c(11)
































σ σ σ σ

1 3885 0920 0134 0777 0213 0855 0210 0777 0217

2 3887 0907 0117 0852 0154 0943 0134 0817 0165

3 3865 0916 0112 0844 0161 0922 0155 0820 0170

4 3875 0946 0105 0793 0198 0841 0202 0777 0207

5 3724 0908 0144 0722 0234 0820 0246 0755 0239

Average 0919 0122 0798 0192 0876 0189 0789 0200












References




























Abstract



Keywords

Introduction


Background


Outlier detection



Results

Conclusion

Competing interests


Acknowledgements

Author details

References



part2χ

partβkpartβl=

Nsumi

1σ 2i


partβk

partf (xiβ)

partβl 1

σiψ(z)

part2f (xiβ)

partβkpartβl

(5)


akl = part2χ


partβk (6)


4suml=1

aklδβl = bk (7)


ρ(z) = 12z2 and ψ(z) = z (8)



kl = akl (k = l) (9)








ρ(z) = |z| and ψ(z) = sign(z) (10)








ω(r) =

⎧⎪⎨⎪⎩

[1

( rc

)2]2 |r| lt c

0 |r| gt c(11)
































σ σ σ σ

1 3885 0920 0134 0777 0213 0855 0210 0777 0217

2 3887 0907 0117 0852 0154 0943 0134 0817 0165

3 3865 0916 0112 0844 0161 0922 0155 0820 0170

4 3875 0946 0105 0793 0198 0841 0202 0777 0207

5 3724 0908 0144 0722 0234 0820 0246 0755 0239

Average 0919 0122 0798 0192 0876 0189 0789 0200












References




























Abstract



Keywords

Introduction


Background


Outlier detection



Results

Conclusion

Competing interests


Acknowledgements

Author details

References





ω(r) =

⎧⎪⎨⎪⎩

[1

( rc

)2]2 |r| lt c

0 |r| gt c(11)
































σ σ σ σ

1 3885 0920 0134 0777 0213 0855 0210 0777 0217

2 3887 0907 0117 0852 0154 0943 0134 0817 0165

3 3865 0916 0112 0844 0161 0922 0155 0820 0170

4 3875 0946 0105 0793 0198 0841 0202 0777 0207

5 3724 0908 0144 0722 0234 0820 0246 0755 0239

Average 0919 0122 0798 0192 0876 0189 0789 0200












References




























Abstract



Keywords

Introduction


Background


Outlier detection



Results

Conclusion

Competing interests


Acknowledgements

Author details

References





















σ σ σ σ

1 3885 0920 0134 0777 0213 0855 0210 0777 0217

2 3887 0907 0117 0852 0154 0943 0134 0817 0165

3 3865 0916 0112 0844 0161 0922 0155 0820 0170

4 3875 0946 0105 0793 0198 0841 0202 0777 0207

5 3724 0908 0144 0722 0234 0820 0246 0755 0239

Average 0919 0122 0798 0192 0876 0189 0789 0200












References




























Abstract



Keywords

Introduction


Background


Outlier detection



Results

Conclusion

Competing interests


Acknowledgements

Author details

References











References




























Abstract



Keywords

Introduction


Background


Outlier detection



Results

Conclusion

Competing interests


Acknowledgements

Author details

References














Abstract



Keywords

Introduction


Background


Outlier detection



Results

Conclusion

Competing interests


Acknowledgements

Author details

References

research openaccess robustdose-responsecurveestimation … · curve fitting. one of the widely used...

Documents