statistical efficiency of curve fitting algorithms

16
Computational Statistics & Data Analysis 47 (2004) 713 – 728 www.elsevier.com/locate/csda Statistical eciency of curve tting algorithms N. Chernov , C. Lesort Department of Mathematics, University of Alabama at Birmingham, Birmingham, AL 35294, USA Received 19 March 2003; received in revised form 5 November 2003; accepted 7 November 2003 Abstract We study the problem of tting parameterized curves to noisy data. Under certain assumptions (known as Cartesian and radial functional models), we derive asymptotic expressions for the bias and the covariance matrix of the parameter estimates. We also extend Kanatani’s version of the Cramer-Rao lower bound, which he proved for unbiased estimates only, to more general estimates that include many popular algorithms (most notably, the orthogonal least squares and algebraic ts). We then show that the gradient-weighted algebraic t is statistically ecient and describe all other statistically ecient algebraic ts. c 2003 Elsevier B.V. All rights reserved. Keywords: Least squares t; Curve tting; Circle tting; Algebraic t; Rao-Cramer bound; Eciency; Functional model 1. Introduction In many applications one ts a parameterized curve described by an implicit equation P(x; y; ) = 0 to experimental data (x i ;y i ), i =1;:::;n. Here denotes the vector of unknown parameters to be estimated. Typically, P is a polynomial in x and y, and its coecients are unknown parameters (or functions of unknown parameters). For example, a number of recent publications (Ahn et al., 2001; Chojnacki et al., 2001; Gander et al., 1994; Leedan and Meer, 2000; Spath, 1997) are devoted to the problem of tting quadrics Ax 2 + Bxy + Cy 2 + Dx + Ey + F = 0, in which case =(A; B; C; D; E; F ) is the parameter vector. The problem of tting circles, given by equation (x a) 2 +(y b) 2 R 2 =0 with three parameters a; b; R, also attracted attention (Chernov and Ososkov, 1984; Kanatani, 1998; Landau, 1987; Spath, 1996). Corresponding author. Tel.: +1-2059342154; fax: +1-2059349025. E-mail address: [email protected] (N. Chernov). 0167-9473/$ - see front matter c 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2003.11.008

Upload: n-chernov

Post on 26-Jun-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical efficiency of curve fitting algorithms

Computational Statistics & Data Analysis 47 (2004) 713–728www.elsevier.com/locate/csda

Statistical e#ciency of curve %tting algorithms

N. Chernov∗ , C. LesortDepartment of Mathematics, University of Alabama at Birmingham, Birmingham, AL 35294, USA

Received 19 March 2003; received in revised form 5 November 2003; accepted 7 November 2003

Abstract

We study the problem of %tting parameterized curves to noisy data. Under certain assumptions(known as Cartesian and radial functional models), we derive asymptotic expressions for the biasand the covariance matrix of the parameter estimates. We also extend Kanatani’s version of theCramer-Rao lower bound, which he proved for unbiased estimates only, to more general estimatesthat include many popular algorithms (most notably, the orthogonal least squares and algebraic%ts). We then show that the gradient-weighted algebraic %t is statistically e#cient and describeall other statistically e#cient algebraic %ts.c© 2003 Elsevier B.V. All rights reserved.

Keywords: Least squares %t; Curve %tting; Circle %tting; Algebraic %t; Rao-Cramer bound; E#ciency;Functional model

1. Introduction

In many applications one %ts a parameterized curve described by an implicit equationP(x; y;�) = 0 to experimental data (xi; yi), i = 1; : : : ; n. Here � denotes the vectorof unknown parameters to be estimated. Typically, P is a polynomial in x and y,and its coe#cients are unknown parameters (or functions of unknown parameters).For example, a number of recent publications (Ahn et al., 2001; Chojnacki et al.,2001; Gander et al., 1994; Leedan and Meer, 2000; Spath, 1997) are devoted to theproblem of %tting quadrics Ax2 + Bxy + Cy2 + Dx + Ey + F = 0, in which case� = (A; B; C; D; E; F) is the parameter vector. The problem of %tting circles, given byequation (x−a)2+(y−b)2−R2=0 with three parameters a; b; R, also attracted attention(Chernov and Ososkov, 1984; Kanatani, 1998; Landau, 1987; Spath, 1996).

∗ Corresponding author. Tel.: +1-2059342154; fax: +1-2059349025.E-mail address: [email protected] (N. Chernov).

0167-9473/$ - see front matter c© 2003 Elsevier B.V. All rights reserved.doi:10.1016/j.csda.2003.11.008

Page 2: Statistical efficiency of curve fitting algorithms

714 N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728

We consider here the problem of %tting general curves given by implicit equationsP(x; y;�)=0 with �=(�1; : : : ; �k) being the parameter vector. Our goal is to investigatestatistical properties of various %tting algorithms. We are interested in their biasedness,covariance matrices, and the Cramer-Rao lower bound.

First, we specify our model. We denote by J� the true value of �. Let ( Jxi; Jy i),i = 1; : : : ; n, be some points lying on the true curve P(x; y; J�) = 0. Experimentallyobserved data points (xi; yi), i = 1; : : : ; n, are perceived as random perturbations of thetrue points ( Jxi; Jy i). We use notation xi = (xi; yi)T and Jxi = ( Jxi; Jy i)T, for brevity. Therandom vectors ei = xi − Jxi are assumed to be independent and have zero mean. Twospeci%c assumptions on their probability distribution can be made, see (Berman andCulpin, 1986):

Cartesian model: Each ei is a two-dimensional normal vector with covariance matrix�2i I , where I is the identity matrix.

Radial model: ei = �ini where �i is a normal random variable N(0; �2i ), and ni is

a unit normal vector to the curve P(x; y; J�) = 0 at the point xi.

Our analysis covers both models, Cartesian and radial. For simplicity, we assume that�2i =�2 for all i, but note that our results can be easily generalized to arbitrary �2

i ¿ 0.Concerning the true points Jxi, i = 1; : : : ; n, two assumptions are possible. Many

researchers (Chan, 1965; Kanatani, 1996, 1998) consider them as %xed, but unknown,points on the true curve. In this case their coordinates ( Jxi; Jy i) can be treated asadditional parameters of the model (nuisance parameters). Chan (1965) and others(Anderson, 1981; Berman and Culpin, 1986) call this assumption a functional model.Alternatively, one can assume that the true points Jxi are sampled from the curveP(x; y; J�) = 0 according to some probability distribution on it. This assumption isreferred to as a structural model (Anderson, 1981; Berman and Culpin, 1986). Weonly consider the functional model here.

It is easy to verify that maximum likelihood estimation of the parameter � for thefunctional model is given by the orthogonal least squares %t (OLSF), which is basedon minimization of the function

F1(�) =n∑

i=1

[di(�)]2; (1.1)

where di(�) denotes the distance from the point xi to the curve P(x; y;�) = 0. TheOLSF is the method of choice in practice, especially when one %ts simple curves suchas lines and circles. However, for more general curves the OLSF becomes intractable,because the precise distance di is hard to compute. For example, when P is a genericquadric (ellipse or hyperbola), the computation of di is equivalent to solving a poly-nomial equation of degree four, and its direct solution is known to be numericallyunstable, see Ahn et al. (2001) and Gander et al. (1994) for more detail. Then oneresorts to various approximations. It is often convenient to minimize

F2(�) =n∑

i=1

[P(xi; yi;�)]2 (1.2)

Page 3: Statistical efficiency of curve fitting algorithms

N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728 715

instead of (1.1). This method is referred to as a (simple) algebraic 4t (AF), in this caseone calls |P(xi; yi;�)| the algebraic distance (Ahn et al., 2001; Chojnacki et al., 2001;Gander et al., 1994) from the point (xi; yi) to the curve. The AF is computationallycheaper than the OLSF, but its accuracy is often unacceptable, see below.

The simple AF (1.2) can be generalized to a weighted algebraic 4t, which is basedon minimization of

F3(�) =n∑

i=1

wi [P(xi; yi;�)]2; (1.3)

where wi = w(xi; yi;�) are some weights, which may balance (1.2) and improve itsperformance. One way to de%ne weights wi results from a linear approximation to di:

di ≈ |P(xi; yi;�)|‖∇xP(xi; yi;�)‖ ;

where ∇xP = (@P=@x; @P=@y) is the gradient vector, see Taubin (1991). Then oneminimizes the function

F4(�) =n∑

i=1

[P(xi; yi;�)]2

‖∇xP(xi; yi;�)‖2 : (1.4)

This method is called the gradient weighted algebraic 4t (GRAF). It is a particularcase of (1.3) with wi = 1=‖∇xP(xi; yi;�)‖2.

The GRAF is known since at least 1974 (Turner, 1974) and recently became standardfor polynomial curve %tting (Taubin, 1991; Leedan and Meer, 2000; Chojnacki etal., 2001). The computational cost of GRAF depends on the function P(x; y;�), but,generally, the GRAF is much faster than the OLSF. It is also known from practice thatthe accuracy of GRAF is almost as good as that of the OLSF, and our analysis belowcon%rms this fact. The GRAF is often claimed to be a statistically optimal weightedalgebraic %t, and we will prove this fact as well.

Not much has been published on statistical properties of the OLSF and algebraic%ts, apart from the simplest case of %tting lines and hyperplanes van HuMel (1997).Chan (1965), Berman and Culpin (1986) investigated circle %tting by the OLSF andthe simple algebraic %t (1.2) assuming the structural model. Kanatani (1996, 1998)used the Cartesian functional model and considered a general curve %tting problem.He established an analogue of the Rao-Cramer lower bound for unbiased estimates of�, which we call here Kanatani-Cramer-Rao (KCR) lower bound. He also showed thatthe covariance matrices of the OLSF and the GRAF attain, to the leading order in �,his lower bound. We note, however, that in most cases the OLSF and algebraic %ts arebiased (Berman and Culpin, 1986; Berman, 1989), hence the KCR lower bound, as itis derived in Kanatani (1996, 1998), does not immediately apply to these methods.

In this paper we extend the KCR lower bound to biased estimates, which includethe OLSF and all weighted algebraic %ts. We prove the KCR bound for estimatessatisfying the following mild assumption:Precision assumption: For precise observations (when xi = Jxi for all 16 i6 n), the

estimate � is precise, i.e.

�( Jx1; : : : ; Jxn) = J�: (1.5)

Page 4: Statistical efficiency of curve fitting algorithms

716 N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728

It is easy to check that the OLSF and algebraic %ts (1.3) satisfy this assumption. Wewill also show that all unbiased estimates of � satisfy (1.5).

We then prove that the GRAF is, indeed, a statistically e#cient %t, in the sensethat its covariance matrix attains, to the leading order in �, the KCR lower bound.On the other hand, rather surprisingly, we %nd that GRAF is not the only statisticallye#cient algebraic %t, and we describe all statistically e#cient algebraic %ts. Finally,we show that Kanatani’s theory and our extension to it remain valid for the radialfunctional model. Our conclusions are illustrated by numerical experiments on circle%tting algorithms.

2. Kanatani–Cramer–Rao lower bound

Recall that we have adopted the functional model, in which the true points Jxi,16 i6 n, are %xed. This automatically makes the sample size n %xed, hence, manyclassical concepts of statistics, such as consistency and asymptotic e#ciency (whichrequire taking the limit n → ∞) lose their meaning. It is customary, in the studiesof the functional model of the curve %tting problem, to take the limit � → 0 insteadof n → ∞, cf. Kanatani (1996, 1998). This is, by the way, not unreasonable fromthe practical point of view: in many experiments, n is rather small and cannot be(easily) increased, so the limit n → ∞ is of little interest. On the other hand, whenthe accuracy of experimental observations is high (thus, � is small), the limit � → 0is quite appropriate.

Now, let �(x1; : : : ; xn) be an arbitrary estimate of � satisfying the precision as-sumption (1.5). In our analysis we will always assume that all the underlying functionsare regular (continuous, have %nite derivatives, etc.), which is a standard assumptionKanatani (1996, 1998).

The mean value of the estimate � is

E(�) =∫

· · ·∫

�(x1; : : : ; xn)n∏

i=1

f(xi) dx1 · · · dxn; (2.1)

where f(xi) is the probability density function for the random point xi, as speci%edby a particular model (Cartesian or radial).

We now expand the estimate �(x1; : : : ; xn) into a Taylor series about the true point( Jx1; : : : ; Jxn) remembering (1.5):

�(x1; : : : ; xn) = J� +n∑

i=1

(∇xi�) × (xi − Jxi) + O(�2); (2.2)

where ∇xi� = ∇xi �( Jx1; : : : ; Jxn) and ∇xi stands for the gradient with respect to thevariables xi; yi. In other words, ∇xi� is a k × 2 matrix of partial derivatives of thek components of the function � with respect to the two variables xi and yi, and thisderivative is taken at the point ( Jx1; : : : ; Jxn),

Page 5: Statistical efficiency of curve fitting algorithms

N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728 717

Substituting the expansion (2.2) into (2.1) gives

E(�) = J� + O(�2) (2.3)

since E(xi − Jxi) = 0. Hence, the bias of the estimate � is of order �2.It easily follows from the expansion (2.2) that the covariance matrix of the estimate

� is given by

C� =n∑

i=1

(∇xi�)E[(xi − Jxi)(xi − Jxi)T] (∇xi�)T + O(�4)

(it is not hard to see that the cubical terms O(�3) vanish because the normal randomvariables with zero mean also have zero third moment, see also Kanatani, 1996). Now,for the Cartesian model

E[(xi − Jxi)(xi − Jxi)T] = �2I

and for the radial model

E[(xi − Jxi)(xi − Jxi)T] = �2ninTi ;

where ni is a unit normal vector to the curve P(x; y; J�) = 0 at the point Jxi. Then weobtain

C� = �2n∑

i=1

(∇xi�)�i (∇xi�)T + O(�4); (2.4)

where �i = I for the Cartesian model and �i = ninTi for the radial model.

Lemma. We have (∇xi�) ninTi (∇xi�)T =(∇xi�)(∇xi�)T for each i=1; : : : ; n. Hence,

for both models, Cartesian and radial, the matrix C� is given by the same expression:

C� = �2n∑

i=1

(∇xi�)(∇xi�)T + O(�4): (2.5)

This lemma is proved in Appendix A.Our next goal is now to %nd a lower bound for the matrix

D1 :=n∑

i=1

(∇xi�)(∇xi�)T: (2.6)

Following Kanatani (1996, 1998), we consider perturbations of the parameter vectorJ� + �� and the true points Jxi + � Jxi satisfying two constraints. First, since the truepoints must belong to the true curve, P( Jxi; J�) = 0, we obtain, by the chain rule,

〈∇xP( Jxi; J�); � Jxi〉 + 〈∇�P( Jxi; J�); ��〉 = 0; (2.7)

where 〈·; ·〉 stands for the scalar product of vectors. Second, since the identity (1.5)holds for all �, we get

n∑i=1

(∇xi�)� Jxi = ��: (2.8)

Page 6: Statistical efficiency of curve fitting algorithms

718 N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728

Now we need to %nd a lower bound for the matrix (2.6) subject to the constraints(2.7) and (2.8). That bound follows from a general theorem in linear algebra:

Theorem (Linear algebra). Let n¿ k¿ 1 and m¿ 1. Suppose n nonzero vectorsui ∈Rm and n nonzero vectors vi ∈Rk are given, 16 i6 n. Consider k × m matrices

Xi =viuT

i

uTi ui

for 16 i6 n, and k × k matrix

B =n∑

i=1

XiX Ti =

n∑i=1

vivTi

uTi ui

:

Assume that the vectors v1; : : : ; vn span Rk (hence B is nonsingular). We say that aset of n matrices A1; : : : ; An (each of size k × m) is proper if

n∑i=1

Aiwi = r (2.9)

for any vectors wi ∈Rm and r ∈Rk such that

uTi wi + vT

i r = 0 (2.10)

for all 16 i6 n. Then for any proper set of matrices A1; : : : ; An the k × k matrixD=

∑ni=1 AiAT

i is bounded from below by B−1 in the sense that D−B−1 is a positivesemide4nite matrix. The equality D = B−1 holds if and only if Ai = −B−1Xi for alli = 1; : : : ; n.

This theorem is, probably, known, but we provide a full proof in Appendix, for thesake of completeness.

As a direct consequence of the above theorem we obtain the lower bound for ourmatrix D1:

Theorem (Kanatani-Cramer-Rao lower bound). We have D1¿Dmin, in the sense thatD1 − Dmin is a positive semide4nite matrix, where

D−1min =

n∑i=1

(∇�P( Jxi;�))(∇�P( Jxi;�))T

‖∇x P( Jxi;�)‖2 : (2.11)

In view of (2.5) and (2.6), the above theorem says that the lower bound for thecovariance matrix C� is, to the leading order,

C�¿Cmin = �2Dmin (2.12)

The standard deviations of the components of the estimate � are of order �� = O(�).Therefore, the bias of �, which is at most of order �2 by (2.3), is in%nitesimally

Page 7: Statistical efficiency of curve fitting algorithms

N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728 719

small, as � → 0, compared to the standard deviations. This means that the estimatessatisfying (1.5) are practically unbiased.

The bound (2.12) was %rst derived by Kanatani (1996, 1998) for the Cartesianfunctional model and strictly unbiased estimates of �, i.e. satisfying E(�) = J�. Onecan easily derive our assumption (1.5) from E(�) = J� by taking the limit � → 0,hence our results generalize those of Kanatani.

3. Statistical e�ciency of algebraic #ts

Here we derive an explicit formula for the covariance matrix of the weighted al-gebraic %t (1.3) and describe the weights wi for which the %t is statistically e#cient.For brevity, we write Pi =P(xi; yi;�). We assume that the weight function w(x; y;�)is regular, in particular has bounded derivatives with respect to �, the next sectionwill demonstrate the importance of this condition. The solution of the minimizationproblem (1.3) satis%es∑

P2i ∇�wi + 2

∑wiPi∇�Pi = 0: (3.1)

Observe that Pi = O(�), so that the %rst sum in (3.1) is O(�2) and the second sumis O(�). Hence, to the leading order, the solution of (3.1) can be found by discardingthe %rst sum and solving the reduced equation∑

wiPi∇�Pi = 0: (3.2)

More precisely, if �1 and �2 are solutions of (3.1) and (3.2), respectively, then�1 − J�=O(�), �2 − J�=O(�), and ‖�1 − �2‖ =O(�2). Furthermore, the covariancematrices of �1 and �2 coincide, to the leading order, i.e. C�1

C−1�2

→ I as � → 0.Therefore, in what follows, we only deal with the solution of Eq. (3.2).

To %nd the covariance matrix of � satisfying (3.2) we put � = J� + �� and xi =Jxi + �xi and obtain, working to the leading order,∑

wi(∇�Pi)(∇�Pi)T(��) = −∑

wi(∇xPi)T(�xi)(∇�Pi) + O(�2)

hence

�� = −[∑

wi(∇�Pi)(∇�Pi)T]−1 [∑

wi(∇xPi)T (�xi) (∇�Pi)]

+ O(�2):

The covariance matrix is then

C� = E[(��)(��)T]

= �2[∑

wi(∇�Pi)(∇�Pi)T]−1 [∑

w2i ‖∇xPi‖2(∇�Pi)(∇�Pi)T

]

×[∑

wi(∇�Pi)(∇�Pi)T]−1

+ O(�4) (3.3)

Page 8: Statistical efficiency of curve fitting algorithms

720 N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728

Denote by D2 the principal factor here, i.e.

D2 =[∑

wi(∇�Pi)(∇�Pi)T]−1 [∑

w2i ‖∇xPi‖2(∇�Pi)(∇�Pi)T

]

×[∑

wi(∇�Pi)(∇�Pi)T]−1

:

The following theorem establishes a lower bound for D2:

Theorem. We have D2¿Dmin, in the sense that D2 −Dmin is a positive semide4nitematrix, where Dmin is given by (2.11). The equality D2 = Dmin holds if and onlyif wi = const=‖∇xPi‖2 for all i = 1; : : : ; n. In other words, an algebraic 4t (1.3) isstatistically e#cient if and only if the weight function w(x; y;�) satis4es

w(x; y;�) =c(�)

‖∇xP(x; y;�)‖2 (3.4)

for all triples x; y;� such that P(x; y;�)=0. Here c(�) may be an arbitrary functionof �.

The bound D2¿Dmin here is a particular case of the previous theorem. It also canbe obtained directly from the linear algebra theorem if one sets ui = ∇xPi, vi = ∇�Pi,and

Ai = −wi

n∑

j=1

wj(∇�Pj)(∇�Pj)T

−1

(∇�Pi)(∇xPi)T

for 16 i6 n.The expression (3.4) characterizing the e#ciency follows from the last claim in the

linear algebra theorem.Remark: The choice of the function c(�) in (3.4) does not aMect the leading term

in (3.3), but it may aMect the O(�4) term. Hence, if one wants to %nd an “optimal”c(�) for a given problem, one may evaluate the O(�4) term explicitly and select c(�)which minimizes it. We believe, however, that even for simple problems, as the onepresented in the next section, such a %ne-tuning would require too much eMort withtoo little improvement.

4. Circle #t

Here we illustrate our conclusions by the relatively simple problem of %tting circles.The canonical equation of a circle is

(x − a)2 + (y − b)2 − R2 = 0 (4.1)

and we need to estimate three parameters a; b; R. The simple algebraic %t (1.2) takesform

F2(a; b; R) =n∑

i=1

[(xi − a)2 + (yi − b)2 − R2]2 → min (4.2)

Page 9: Statistical efficiency of curve fitting algorithms

N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728 721

and the weighted algebraic %t (1.3) takes form

F3(a; b; R) =n∑

i=1

wi[(xi − a)2 + (yi − b)2 − R2]2 → min: (4.3)

In particular, the GRAF becomes

F4(a; b; R) =n∑

i=1

[(xi − a)2 + (yi − b)2 − R2]2

(xi − a)2 + (yi − b)2 → min (4.4)

(where the irrelevant constant factor of 4 in the denominator is dropped).In terms of (2.11), we have

∇�P( Jxi;�) = −2( Jxi − a; Jy i − b; R)T

and ∇x P( Jxi;�) = 2( Jxi − a; Jy i − b)T, hence

‖∇x P( Jxi;�)‖2 = 4[( Jxi − a)2 + ( Jy i − b)2] = 4R2:

Therefore,

Dmin =

∑u2i

∑uivi

∑ui∑

uivi∑

v2i

∑vi∑

ui∑

vi n

−1

(4.5)

where we denote, for brevity,

ui =Jxi − aR

; vi =Jy i − bR

The above expression for Dmin was derived earlier in Chan and Thomas (1995), andKanatani (1996).

Now, our Theorem in Section 3 shows that the weighted algebraic %t (4.3) is statisti-cally e#cient if and only if the weight function satis%es w(x; y; a; b; R)=c(a; b; R)=(4R2).Since c(a; b; R) may be an arbitrary function, then the denominator 4R2 here is ir-relevant. Hence, statistically e#ciency is achieved whenever w(x; y; a; b; R) is simplyindependent of x and y for all (x; y) lying on the circle. In particular, the GRAF (4.4)is statistically e#cient because w(x; y; a; b; R) = [(x − a)2 + (y − b)2]−1 = R−2. Thesimple AF (4.2) is also statistically e#cient since w(x; y; a; b; R) = 1.

We note that the GRAF (4.4) is a highly nonlinear problem, and in its exact form(4.4) is not used in practice. Instead, there are two modi%cations of GRAF popularamong experimenters. One is due to Chernov and Ososkov (1984) and Pratt (1987).

F′4(a; b; R) = R−2

n∑i=1

[(xi − a)2 + (yi − b)2 − R2]2 → min (4.6)

Page 10: Statistical efficiency of curve fitting algorithms

722 N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728

Table 1E#ciency of circle %tting algorithms. Data are sampled along a full circle

�=R OLSF AF Pratt Taubin

¡0:01 ∼ 1 ∼ 1 ∼ 1 ∼ 10.01 0.999 0.999 0.999 0.9990.02 0.999 0.998 0.997 0.9970.03 0.998 0.996 0.995 0.9950.05 0.996 0.992 0.987 0.9870.10 0.985 0.970 0.953 0.9530.20 0.935 0.900 0.837 0.8350.30 0.825 0.824 0.701 0.692

(it is based on the approximation (xi − a)2 + (yi − b)2 ≈ R2), and the other due toAgin (1981) and Taubin (1991):

F′′4 (a; b; R) =

1∑(xi − a)2 + (yi − b)2

×n∑

i=1

[(xi − a)2 + (yi − b)2 − R2]2 → min (4.7)

(here one simply averages the denominator of (4.4) over 16 i6 n). We refer thereader to (Chernov and Lesort, 2002) for a detailed analysis of these and other circle%tting algorithms, including their numerical implementations.

We have tested experimentally the e#ciency of four circle %tting algorithms: theOLSF (1.1), the simple AF (4.2), the Pratt method (4.6), and the Taubin method(4.7). We have generated n= 20 points equally spaced on a circle, added an isotropicGaussian noise with variance �2 (according to the Cartesian model), and estimated thee#ciency of the estimate of the center by

E =�2(D11 + D22)

〈(a − a)2 + (b − b)2〉 : (4.8)

Here (a; b) is the true center, (a; b) is its estimate, 〈· · ·〉 denotes averaging over manyrandom samples, and D11, D22 are the %rst two diagonal entries of the matrix (4.5). Wedo not include here the third parameter, R, since its estimate R is strongly correlatedwith a and b, hence its inclusion might distort the value of E.

Table 1 shows the e#ciency of the above mentioned four algorithms for variousvalues of �=R. We used 106 random samples, so that all the digits shown are accurate.We see that they all perform very well, and indeed are e#cient as � → 0. One mightnotice that the OLSF slightly outperforms the other methods, and the AF is the secondbest.

Table 2 shows the e#ciency of the same algorithms as the data points are sampledalong half a circle, rather than a full circle. Again, the e#ciency as � → 0 is clear,but we also make another observation. The AF now consistently falls behind the other

Page 11: Statistical efficiency of curve fitting algorithms

N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728 723

Table 2E#ciency of circle %tting algorithms with data sampled along half a circle

�=R OLSF AF Pratt Taubin

¡0:01 ∼ 1 ∼ 1 ∼ 1 ∼ 10.01 0.999 0.996 0.999 0.9990.02 0.997 0.983 0.997 0.9970.03 0.994 0.961 0.992 0.9920.05 0.984 0.902 0.978 0.9780.10 0.935 0.720 0.916 0.9160.20 0.720 0.493 0.703 0.6910.30 0.122 0.437 0.186 0.141

Table 3Data are sampled along a quarter of a circle

�=R OLSF AF Pratt Taubin

0.01 0.997 0.911 0.997 0.9970.02 0.977 0.722 0.978 0.9780.03 0.944 0.555 0.946 0.9460.05 0.837 0.365 0.843 0.8420.10 0.155 0.275 0.163 0.158

methods for all �=R6 0:2, but for �=R = 0:3 the others suddenly break down, whilethe AF keeps aRoat.

The reason of the above turnaround is that at large noise the data points may occa-sionally line up along a circular arc of a very large radius. Then the OLSF, Pratt andTaubin faithfully return a large circle whose center lies far away, and such %ts blowup the denominator of (4.8), which is a typical eMect of large outliers. On the contrary,the AF is notoriously known for its systematic bias toward smaller circles (Chernovand Ososkov, 1984; Gander et al., 1994; Pratt, 1987), hence while it is less accuratethan other %ts for typical random samples, its bias safeguards it from large outliers.

This behavior is even more pronounced when the data are sampled along quarter 1

of a circle (Table 3). We see that the AF is now far worse than the other %ts for�=R¡ 0:1 but the others characteristically break down at some point (�=R = 0:1).

It is interesting to test smaller circular arcs, too. Fig. 1 shows a color-coded diagramof the e#ciency of the OLSF and the AF for arcs from 0◦ to 50◦ and variable � (weset � = ch, where h is the height of the circular arc, see Fig. 2, and c varies from0 to 0.5). The e#ciency of the Pratt and Taubin is virtually identical to that of theOLSF, so it is not shown here. We see that the OLSF and AF are e#cient as � → 0(both squares in the diagram get white at the bottom), but the AF loses its e#ciency

1 All our algorithms are invariant under simple geometric transformations such as translations, rotationsand similarities, hence our experimental results do not depend on the choice of the circle, its size, and thepart of the circle the data are sampled from.

Page 12: Statistical efficiency of curve fitting algorithms

724 N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728

5040302010

0.5

0.4

0.3

0.2

0.1

0

c c

Arc in degrees5040302010

0.5

0.4

0.3

0.2

0.1

0

Arc in degrees

1.00.90.80.70.60.50.40.30.20.10

Fig. 1. The e#ciency of the simple OLSF (left) and the AF (center). The bar on the right explains colorcodes.

h

σ = ch

Fig. 2. The height of an arc, h, and our formula for �.

at moderate levels of noise (c¿ 0:1), while the OLSF remains accurate up to c = 0:3after which it rather sharply breaks down.

The following analysis sheds more light on the behavior of the circle %tting algo-rithms. When the curvature of the arc decreases, the center coordinates a; b and theradius R grow to in%nity and their estimates become highly unreliable. In that case thecircle equation (4.1) can be converted to a more convenient algebraic form

A(x2 + y2) + Bx + Cy + D = 0 (4.9)

with an additional constrain on the parameters: B2 +C2 −4AD=1. This parametrizationwas used in Pratt (1987), Gander et al. (1994) and analyzed in detail in Chernovand Lesort (2002). We note that the original parameters can be recovered via a =−B=2A, b=−C=2A, and R=(2 |A|)−1. The new parametrization (4.9) is safe to use forarcs with arbitrary small curvature: the parameters A; B; C; D remain bounded and neverdevelop singularities, see Chernov and Lesort (2002). Even as the curvature vanishes,we simply get A = 0, and the equation (4.9) represents a line Bx + Cy + D = 0.

In terms of the new parameters A; B; C; D, the weighted algebraic %t (1.3) takes form

F3(A; B; C; D) =n∑

i=1

wi[A(x2i + y2

i ) + Bxi + Cyi + D]2 → min (4.10)

(under the constraint B2+C2−4AD=1). Converting the AF (4.2) to the new parametersgives

F2(A; B; C; D) =n∑

i=1

A−2[A(x2i + y2

i ) + Bxi + Cyi + D]2 → min (4.11)

Page 13: Statistical efficiency of curve fitting algorithms

N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728 725

5040302010

0.5

0.4

0.3

0.2

0.1

0

cArc in degrees

5040302010

0.5

0.4

0.3

0.2

0.1

0

c

Arc in degrees

1.00.90.80.70.60.50.40.30.20.10

Fig. 3. The e#ciency of the estimation of the parameter A by the simple AF (left) and the Pratt method(center). The bar on the right explains color codes.

which corresponds to the weight function w = 1=A2. The Pratt method (4.6) turns to

F4(A; B; C; D) =n∑

i=1

[A(x2i + y2

i ) + Bxi + Cyi + D]2 → min: (4.12)

We now see why the AF is unstable and inaccurate for arcs with small curvature: itsweight function w=1=A2 develops a singularity (it explodes) in the limit A → 0. Recallthat, in our derivation of the statistical e#ciency theorem (Section 3), we assumed thatthe weight function was regular (had bounded derivatives). This assumption is clearlyviolated by the AF (4.11). On the contrary, the Pratt %t (4.12) uses a safe choice w=1and thus behaves decently on arcs with small curvature, see next.

Fig. 3 shows a color-coded diagram of the e#ciency of the estimate of the param-eter 2 A by the AF (4.11) versus Pratt (4.12) for arcs from 0◦ to 50◦ and the noiselevel �= ch, where h is the height of the circular arc and c varies from 0 to 0.5. Thee#ciency of the OLSF and the Taubin method is visually indistinguishable from thatof Pratt (the central square in Fig. 3), so we did not include it here.

We see that the AF performs signi%cantly worse than the Pratt method for all arcsand most of the values of c (i.e., �). The Pratt’s e#ciency is close 100%, its lowestpoint is 89% for 50◦ arcs and c = 0:5 (the top right corner of the central squarebarely gets grey). The AF’s e#ciency is below 10% for all c¿ 0:2 and almost zerofor c¿ 0:4. Still, the AF remains e#cient as � → 0 (as the tiny white strip at thebottom of the left square proves), but its e#ciency can be only counted on when � isextremely small.

Our analysis demonstrates that the choice of the weights wi in the weighted algebraic%t (1.3) should be made according to our theorem in Section 3, and, in addition, oneshould avoid singularities in the domain of parameters.

Appendix A.

Here we prove the theorem of linear algebra stated in Section 2. For the sake ofclarity, we divide our proof into small lemmas:

2 Note that |A| = 1=2R, hence the estimation of A is equivalent to that of the curvature, an importantgeometric parameter of the arc.

Page 14: Statistical efficiency of curve fitting algorithms

726 N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728

Lemma 1. The matrix B is indeed nonsingular.

Proof. If Bz = 0 for some nonzero vector z ∈Rk , then 0 = zTBz =∑n

i=1(vTi z)

2=‖ui‖2,hence vT

i z = 0 for all 16 i6 k, a contradiction.

Lemma 2. If a set of n matrices A1; : : : ; An is proper, then rank(Ai)6 1. Furthermore,each Ai is given by Ai = ziuT

i for some vector zi ∈Rk , and the vectors z1; : : : ; zn satisfy∑ni=1 ziv

Ti = −I where I is the k × k identity matrix. The converse is also true.

Proof. Let vectors w1; : : : ; wn and r satisfy the requirements (2.9) and (2.10) of thetheorem. Consider the orthogonal decomposition wi = ciui + w⊥

i where w⊥i is perpen-

dicular to ui, i.e. uTi w

⊥i = 0. Then the constraint (2.10) can be rewritten as

ci = − vTi r

uTi ui

(A.1)

for all i = 1; : : : ; n and (2.9) takes formn∑

i=1

ciAiui +n∑

i=1

Aiw⊥i = r: (A.2)

We conclude that Aiw⊥i = 0 for every vector w⊥

i orthogonal to ui, hence Ai has a(k−1)-dimensional kernel, so indeed its rank is zero or one. If we denote zi=Aiui=‖ui‖2,we obtain Ai = ziuT

i . Combining this with (A.1)–(A.2) gives

r = −n∑

i=1

(vTi r)zi = −

(n∑

i=1

zivTi

)r:

Since this identity holds for any vector r ∈Rk , the expression within parentheses is−I . The converse is obtained by straightforward calculations. Lemma is proved.

Corollary. Let ni = ui=‖ui‖. Then AininTi Ai = AiAT

i for each i.

This corollary implies our lemma stated in Section 2. We now continue the proofof the theorem.

Lemma 3. The sets of proper matrices make a linear variety, in the following sense.Let A′

1; : : : ; A′n and A′′

1 ; : : : ; A′′n be two proper sets of matrices, then the set A1; : : : ; An

de4ned by Ai = A′i + c(A′′

i − A′i) is proper for every c∈R.

Proof. According to the previous lemma, A′i = z′

i uTi and A′′

i = z′′i u

Ti for some vectors

z′i ; z

′′i , 16 i6 n. Therefore, Ai = ziuT

i for zi = z′i + c(z′′

i − z′i). Lastly,

n∑i=1

zivTi =

n∑i=1

z′i v

Ti + c

n∑i=1

z′′i v

Ti − c

n∑i=1

z′i v

Ti = −I:

Lemma is proved.

Lemma 4. If a set of n matrices A1; : : : ; An is proper, then∑n

i=1 AiX Ti = −I , where

I is the k × k identity matrix.

Page 15: Statistical efficiency of curve fitting algorithms

N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728 727

Proof. By using Lemma 2∑n

i=1 AiX Ti =

∑ni=1 ziv

Ti = −I . Lemma is proved.

Lemma 5. We have indeed D¿B−1.

Proof. For each i = 1; : : : ; n consider the 2k × m matrix

Yi =

(Ai

Xi

):

Using the previous lemma gives

n∑i=1

Yi Y Ti =

(D −I

−I B

):

By construction, this matrix is positive semide%nite. Hence, the following matrix isalso positive semide%nite:(

I B−1

0 B−1

)(D −I

−I B

)(I 0

B−1 B−1

)=

(D − B−1 0

0 B−1

):

By Sylvester’s theorem, the matrix D − B−1 is positive semide%nite.

Lemma 6. The set of matrices A◦i = −B−1Xi is proper, and for this set we have

D = B−1.

Proof. Straightforward calculation.

Lemma 7. If D = B−1 for some proper set of matrices A1; : : : ; An, then Ai = A◦i for

all 16 i6 n.

Proof. Assume that there is a proper set of matrices A′1; : : : ; A

′n, diMerent from A◦

1 ; : : : ; Aon,

for which D = B−1. Denote �Ai = A′i − A◦

i . By Lemma 3, the set of matrices Ai(*) =A◦i + *(�Ai) is proper for every real *. Consider the variable matrix

D(*) =n∑

i=1

[Ai(*)][Ai(*)]T

=n∑

i=1

A◦i (A

◦i )

T + *

(n∑

i=1

A◦i (�Ai)T +

n∑i=1

(�Ai)(A◦i )

T

)+ *2

n∑i=1

(�Ai)(�Ai)T:

Note that the matrix R =∑n

i=1 A◦i (�Ai)T +

∑ni=1(�Ai)(A◦

i )T is symmetric. By Lemma

5 we have D(*)¿B−1 for all *, and by Lemma 6 we have D(0) = B−1. It is theneasy to derive that R= 0. Next, the matrix S =

∑ni=1(�Ai)(�Ai)T is symmetric positive

semide%nite. Since we assumed that D(1)=D(0)=B−1, it is easy to derive that S =0as well. Therefore, �Ai = 0 for every i = 1; : : : ; n. The theorem is proved.

Page 16: Statistical efficiency of curve fitting algorithms

728 N. Chernov, C. Lesort / Computational Statistics & Data Analysis 47 (2004) 713–728

References

Agin, G.J., 1981. Fitting Ellipses and General Second-Order Curves, Carnegi Mellon University, RoboticsInstitute, Technical Report 81–5.

Ahn, S.J., Rauh, W., Warnecke, H.J., 2001. Least-squares orthogonal distances %tting of circle, sphere, ellipse,hyperbola, and parabola. Pattern Recog. 34, 2283–2303.

Anderson, D.A., 1981. The circular structural model. J. Roy. Statist. Soc. Ser. B 27, 131–141.Berman, M., 1989. Large sample bias in least squares estimators of a circular arc center and its radius.

Comput. Vision, Graphics Image Process. 45, 126–128.Berman, M., Culpin, D., 1986. The statistical behaviour of some least squares estimators of the centre and

radius of a circle. J. Roy. Statist. Soc. Ser. B 48, 183–196.Chan, N.N., 1965. On circular functional relationships. J. Roy. Statist. Soc. Ser. B 27, 45–56.Chan, Y.T., Thomas, S.M., 1995. Cramer-Rao Lower Bounds for Estimation of a Circular Arc Center and

Its Radius. Graph. Models Image Proc. 57, 527–532.Chernov, N., Lesort, C., 2002. Fitting circles and lines by least squares: theory and experiment, preprint,

available at http://www.math.uab.edu/cl/cl1.Chernov, N.I., Ososkov, G.A., 1984. EMective algorithms for circle %tting. Comp. Phys. Comm. 33,

329–333.Chojnacki, W., Brooks, M.J., van den Hengel, A., 2001. Rationalising the renormalisation method of

Kanatani. J. Math. Imag. Vision 14, 21–38.Gander, W., Golub, G.H., Strebel, R., 1994. Least squares %tting of circles and ellipses. BIT 34, 558–578.van HuMel, S. (Ed.), 1997. Recent Advances in Total Least Squares Techniques and Errors-in-variables

Modeling. SIAM, Philadelphia.Kanatani, K., 1996. Statistical Optimization for Geometric Computation: Theory and Practice. Elsevier,

Amsterdam.Kanatani, K., 1998. Cramer-Rao lower bounds for curve %tting. Graph. Models Image Proc. 60, 93–99.Landau, U.M., 1987. Estimation of a circular arc center and its radius. Comput. Vision Graph. Image Process.

38, 317–326.Leedan, Y., Meer, P., 2000. Heteroscedastic regression in computer vision: Problems with bilinear constraint.

Internat. J. Comp. Vision 37, 127–150.Pratt, V., 1987. Direct least-squares %tting of algebraic surfaces. Comput. Graph. 21, 145–152.Spath, H., 1996. Least-squares %tting by circles. Computing 57, 179–185.Spath, H., 1997. Orthogonal least squares %tting by conic sections. in: van HuMel, S. (Ed.), Recent Advances

in Total Least Squares techniques and Errors-in-Variables Modeling, SIAM, pp. 259–264.Taubin, G., 1991. Estimation of planar curves, surfaces and nonplanar space curves de%ned by implicit

equations, with applications to edge and range image segmentation. IEEE Trans. Pattern Anal. MachineIntel. 13, 1115–1138.

Turner, K., 1974. Computer perception of curved objects using a television camera. Ph.D. Thesis, Dept. ofMachine Intelligence, University of Edinburgh.