probabilisticanalysisofadifferentialequation ...1. introduction...

37
http://www.elsevier.com/locate/jco Journal of Complexity 19 (2003) 474–510 Probabilistic analysis of a differential equation for linear programming Asa Ben-Hur, a,b Joshua Feinberg, c,d, Shmuel Fishman, d,e and Hava T. Siegelmann f a Biochemistry Department, Stanford University, Stanford, CA 94305, USA b Faculty of Industrial Engineering and Management, Technion, Haifa 32000, Israel c Physics Department, University of Haifa at Oranim, Tivon 36006, Israel d Physics Department, Technion, Israel Institute of Technology, Haifa 32000, Israel e Institute for Theoretical Physics, University of California, Santa Barbara, CA 93106, USA f Laboratory of Bio-computation, Department of Computer Science, University of Massachusetts at Amherst, Amherst, MA 01003, USA Received 29 October 2001; revised 1 August 2002; accepted 12 March 2003 Abstract In this paper we address the complexity of solving linear programming problems with a set of differential equations that converge to a fixed point that represents the optimal solution. Assuming a probabilistic model, where the inputs are i.i.d. Gaussian variables, we compute the distribution of the convergence rate to the attracting fixed point. Using the framework of Random Matrix Theory, we derive a simple expression for this distribution in the asymptotic limit of large problem size. In this limit, we find the surprising result that the distribution of the convergence rate is a scaling function of a single variable. This scaling variable combines the convergence rate with the problem size (i.e., the number of variables and the number of constraints). We also estimate numerically the distribution of the computation time to an approximate solution, which is the time required to reach a vicinity of the attracting fixed point. We find that it is also a scaling function. Using the problem size dependence of the distribution functions, we derive high probability bounds on the convergence rates and on the computation times to the approximate solution. r 2003 Elsevier Science (USA). All rights reserved. Keywords: Theory of Analog Computation; Dynamical systems; Linear programming; Scaling; Random Matrix Theory ARTICLE IN PRESS Corresponding author. Physics Department, University of Haifa at Oranim, 36006 Tivon, Israel. E-mail addresses: [email protected] (A. Ben-Hur), [email protected] (J. Feinberg), fi[email protected] (S. Fishman), [email protected] (H.T. Siegelmann). 0885-064X/03/$ - see front matter r 2003 Elsevier Science (USA). All rights reserved. doi:10.1016/S0885-064X(03)00032-3

Upload: others

Post on 20-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

http://www.elsevier.com/locate/jcoJournal of Complexity 19 (2003) 474–510

Probabilistic analysis of a differential equationfor linear programming

Asa Ben-Hur,a,b Joshua Feinberg,c,d,� Shmuel Fishman,d,e andHava T. Siegelmannf

aBiochemistry Department, Stanford University, Stanford, CA 94305, USAbFaculty of Industrial Engineering and Management, Technion, Haifa 32000, Israel

cPhysics Department, University of Haifa at Oranim, Tivon 36006, IsraeldPhysics Department, Technion, Israel Institute of Technology, Haifa 32000, Israel

e Institute for Theoretical Physics, University of California, Santa Barbara, CA 93106, USAfLaboratory of Bio-computation, Department of Computer Science, University of Massachusetts at Amherst,

Amherst, MA 01003, USA

Received 29 October 2001; revised 1 August 2002; accepted 12 March 2003

Abstract

In this paper we address the complexity of solving linear programming problems with a set

of differential equations that converge to a fixed point that represents the optimal solution.

Assuming a probabilistic model, where the inputs are i.i.d. Gaussian variables, we compute the

distribution of the convergence rate to the attracting fixed point. Using the framework of

Random Matrix Theory, we derive a simple expression for this distribution in the asymptotic

limit of large problem size. In this limit, we find the surprising result that the distribution of

the convergence rate is a scaling function of a single variable. This scaling variable combines

the convergence rate with the problem size (i.e., the number of variables and the number of

constraints). We also estimate numerically the distribution of the computation time to an

approximate solution, which is the time required to reach a vicinity of the attracting fixed

point. We find that it is also a scaling function. Using the problem size dependence of the

distribution functions, we derive high probability bounds on the convergence rates and on the

computation times to the approximate solution.

r 2003 Elsevier Science (USA). All rights reserved.

Keywords: Theory of Analog Computation; Dynamical systems; Linear programming; Scaling; Random

Matrix Theory

ARTICLE IN PRESS

�Corresponding author. Physics Department, University of Haifa at Oranim, 36006 Tivon, Israel.

E-mail addresses: [email protected] (A. Ben-Hur), [email protected]

(J. Feinberg), [email protected] (S. Fishman), [email protected] (H.T. Siegelmann).

0885-064X/03/$ - see front matter r 2003 Elsevier Science (USA). All rights reserved.

doi:10.1016/S0885-064X(03)00032-3

Page 2: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

1. Introduction

In recent years scientists have developed new approaches to computation, some ofthem based on continuous time analog systems. Analog VLSI devices, that are oftendescribed by differential equations, have applications in the fields of signalprocessing and optimization. Many of these devices are implementations of neuralnetworks [12,19,20], or the so-called neuromorphic systems [21] which are hardwaredevices whose structure is directly motivated by the workings of the brain. Inaddition there is an increasing number of algorithms based on differential equationsthat solve problems such as sorting [10], linear programming [14] and algebraicproblems such as singular value decomposition and finding of eigenvectors (see [18]and references therein). On a more theoretical level, differential equations are knownto simulate Turing machines [8]. The standard theory of computation andcomputational complexity [24] deals with computation in discrete time and in adiscrete configuration space, and is inadequate for the description of such systems.This work may prove useful in the analysis and comparison of analog computationaldevices (see e.g. [11,20]).In a recent paper we have proposed a framework of analog computation based on

ODEs that converge exponentially to fixed points [5]. In such systems it is natural toconsider the attracting fixed point as the output. The input can be modeled in variousways. One possible choice is the initial condition. This is appropriate when the aim ofthe computation is to decide to which attractor out of many possible ones the systemflows (see [28]). The main problem within this approach is related to initialconditions in the vicinity of basin boundaries. The flow in the vicinity of theboundary is slow, resulting in very long computation times. Here, as in [5] theparameters on which the vector field depends are the input, and the initial conditionis part of the algorithm. This modeling is natural for optimization problems, whereone wishes to find extrema of some function EðxÞ; e.g. by a gradient flow ’x ¼grad EðxÞ: An instance of the optimization problem is specified by the parameters ofEðxÞ; i.e. by the parameters of the vector field.The basic entity in our model of analog computation is a set of ODEs

dx

dt¼ FðxÞ; ð1Þ

where x is an n-dimensional vector, and F is an n-dimensional smooth vector field,which converges exponentially to a fixed point. Eq. (1) solves a computationalproblem as follows: Given an instance of the problem, the parameters of the vectorfield F are set, and it is started from some pre-determined initial condition. Theresult of the computation is then deduced from the fixed point that the systemapproaches.Even though the computation happens in a real configuration space, this model

can be considered as either a model with real inputs, as for example the BSS model[7], or as a model with integer or rational inputs, depending what types of values theinitial conditions are given. In [5] it was argued that the time complexity in a largeclass of ODEs is the physical time that is the time parameter of the system. The

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 475

Page 3: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

initial condition there was assumed to be integer or rational. In the present paper, onthe other hand, we consider real inputs. More specifically, we will analyze thecomplexity of a flow for linear programming (LP) introduced in [14]. In the realnumber model the complexity of solving LP with interior point methods isunbounded [31], and a similar phenomenon occurs for the flow we analyze here. Toobtain finite computation times one can either measure the computation time interms of a condition number as in [25], or impose a distribution over the set of LPinstances. Many of the probabilistic models used to study the performance of thesimplex algorithm and interior point methods assume a Gaussian distribution of thedata [27,29,30], and we adopt this assumption for our model. Recall that the worst-case bound for the simplex algorithm is exponential whereas some of theprobabilistic bounds are quadratic [27].

Two types of probabilistic analysis were carried out in the LP literature: averagecase and ‘‘high probability’’ behavior [4,33,34]. A high probability analysis providesa bound on the computation time that holds with probability 1 as the problem sizegoes to infinity [34]. In a worst-case analysis interior point methods generally require

Oðffiffiffin

pj log ejÞ iterations to compute the cost function with e-precision, where n is the

number of variables [33]. The high probability analysis essentially sets a limit on the

required precision and yields Oðffiffiffin

plog nÞ behavior [34]. However, the number of

iterations has to be multiplied by the complexity of each iteration which is Oðn3Þ;resulting in an overall complexity Oðn3:5 log nÞ in the high probability model [33]. Thesame factor per iteration appears in the average case analysis as well [4].

In contrast, in our model of analog computation, the computation time is thephysical time required by a hardware implementation of the vector field FðxÞ toconverge to the attracting fixed point. We need neither to follow the flow step-wisenor to calculate the vector field FðxÞ since it is assumed to be realized in hardwareand does not require repetitive digital approximations. As a result, the complexity of

analog processes does not include the Oðn3Þ term as above, and in particular it islower than the digital complexity of interior point methods. In this set-up weconjecture, based on numerical calculations, that the flow analyzed in this paper hascomplexity Oðn log nÞ on average and with high probability. This is higher than thenumber of iterations of state of the art interior point methods, but lower than the

overall complexity Oðn3:5 log nÞ of the high probability estimate mentioned above,which includes the complexity of an individual operation.

In this paper we consider a flow for linear programming proposed by Faybusovich[14], for which FðxÞ is given by (4). Substituting (4) into the general equation (1) weobtain (5), which realizes the Faybusovich algorithm for LP. We consider real inputsthat are drawn from a Gaussian probability distribution. For any feasible instance ofthe LP problem, the flow converges to the solution. We consider the question: Giventhe probability distribution of LP instances, what is the probability distribution ofthe convergence rates to the solution? The convergence rate measures the asymptoticcomputation time: the time to reach an e vicinity of the attractor, where e isarbitrarily small. The main result of this paper, as stated in Theorem 4.1, is thatwith high probability and on the average, the asymptotic computation time is

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510476

Page 4: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

Oðffiffiffin

pj log ejÞ; where n is the problem size and e is the required precision (see also

Corollary 5.1).In practice, the solution to arbitrary precision is not always required, and one may

need to know only whether the flow (1) or (5) has reached the vicinity of the optimalvertex, or which vertex out of a given set of vertices will be approached by thesystem. Thus, the non-asymptotic behavior of the flow needs to be considered [5]. Inthis case, only a heuristic estimate of the computation time is presented, and inSection 6 we conjecture that the associated complexity is Oðn log nÞ; as mentionedabove.The rest of the paper is organized as follows: In Section 2, the Faybusovich flow is

presented along with an expression for its convergence rate. The probabilisticensemble of the LP instances is presented in Section 3. The distribution of theconvergence rate of this flow is calculated analytically in the framework of randommatrix theory (RMT) in Section 4. In Section 5, we introduce the concept of ‘‘high-probability behavior’’ and use the results of Section 4 to quantify the high-probability behavior of our probabilistic model. In Section 6, we provide measuresof complexity when precision asymptotic in e is not required. Some of the results inSections 6.2–8 are heuristic, supported by numerical evidence. The structure of thedistribution functions of parameters that control the convergence is described inSection 7 and its numerical verification is presented in Section 8. Finally, the resultsof this work and their possible implications are discussed in Section 9. Sometechnical details are relegated to the appendices. Appendix A contains more detailsof the Faybusovich flow. Appendix B exposes the details of the analytical calcula-tion of the results presented in Section 4, and Appendix C contains the necessarydetails of random matrix theory relevant for that calculation.

2. A flow for linear programming

We begin with the definition of the linear programming problem (LP) and a vectorfield for solving it introduced by Faybusovich in [14]. The standard form of LP is tofind

maxfcT x: xARn; Ax ¼ b; xX0g; ð2Þ

where cARn; bARm; AARmn and mpn: The set generated by the constraints in (2) isa polyhedron. If a bounded optimal solution exists, it is obtained at one of itsvertices. Let BCf1;y; ng; jBj ¼ m; and N ¼ f1;y; ng\B; and denote by xB thecoordinates with indices from B; and by AB; the m m matrix whose columns arethe columns of A with indices from B: A vertex of the LP problem is defined by a setof indices B; which is called a basic set, if

xB ¼ A1B bX0: ð3Þ

The components of a vertex are xB that satisfy (3), and xN ¼ 0: The set N is thencalled a non-basic set. Given a vector field that converges to an optimal solutionrepresented by basic and non-basic sets B and N; its solution xðtÞ can be

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 477

Page 5: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

decomposed as ðxNðtÞ; xBðtÞÞ where xNðtÞ converges to 0, and xBðtÞ converges

to A1B b:

In the following we consider the non-basic set N ¼ f1;y; n mg; and fornotational convenience denote the m m matrix AB by B and denote AN by N; i.e.A ¼ ðN;BÞ:The Faybusovich vector field is a projection of the gradient of the linear cost

function onto the constraint set, relative to a Riemannian metric which enforces the

positivity constraints xX0 [14]. Let hðxÞ ¼ cT x: We denote this projection by grad h:The explicit form of the gradient is:

grad hðxÞ ¼ ½X XATðAXATÞ1AX �c; ð4Þ

where X is the diagonal matrix Diagðx1yxnÞ: It is clear from (4) that

A grad hðxÞ ¼ 0:

Thus, the dynamics

dx

dt¼ grad hðxÞ ð5Þ

preserves the constraint Ax ¼ b in (2). Thus, the faces of the polyhedron areinvariant sets of the dynamics induced by grad h: Furthermore, it is shown in [14]that the fixed points of grad h coincide with the vertices of the polyhedron, and thatthe dynamics converges exponentially to the maximal vertex of the LP problem.Since the formal solution of the Faybusovich vector field is the basis of our analysiswe give its derivation in Appendix A.Solving (5) requires an appropriate initial condition—an interior point in this case.

This can be addressed either by using the ‘‘big-M’’ method [26], which has essentiallythe same convergence rate, or by solving an auxiliary linear programming problem[34]. We stress that here, the initial interior point is not an input for the computation,but rather a part of the algorithm. In the analog implementation the initial pointshould be found by the same device used to solve the LP problem.The linear programming problem (2) has n m independent variables. The formal

solution shown below, describes the time evolution of the n m variables xNðtÞ; interms of the variables xBðtÞ:WhenN is the non-basic set of an optimal vertex of the

LP problem, xNðtÞ converges to 0, and xBðtÞ converges to A1B b: Denote by e1;y; en

the standard basis of Rn; and define the n m vectors

mi ¼ ei þXm

j¼1ajie

j; ð6Þ

where

aji ¼ ðB1NÞji ð7Þ

is an m ðn mÞ matrix. The vectors mi are perpendicular to the rows of A and areparallel to the faces of the polyhedron defined by the constraints. In this notation the

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510478

Page 6: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

analytical solution is (see Appendix A):

xiðtÞ ¼ xið0Þ exp Dit Xm

j¼1aji log

xjþnmðtÞxjþnmð0Þ

!; iAN ¼ f1;y; n mg;

ð8Þ

where xið0Þ and xjþnmð0Þ are components of the initial condition, xjþnmðtÞ are thexB components of the solution, and

Di ¼ /mi; cS ¼ ci Xm

j¼1cjaji ð9Þ

(where /:; :S is the Euclidean inner product).An important property which relates the signs of the Di and the optimality of the

partition of A (into ðB;NÞ) relative to which they were computed is now stated:

Lemma 2.1 (Faybusovich [14]). For a polyhedron with fn m þ 1;y; ng; a basic set

of a maximum vertex,

DiX0 i ¼ 1;y; n m:

The converse statement does not necessarily hold. The Di are independent of b:Thus we may have that all Di are positive, and yet the constraint set is empty.

Remark 2.1. Note that the analytical solution is only a formal one, and does notprovide an answer to the LP instance, since the Di depend on the partition of A; andonly relative to a partition corresponding to a maximum vertex are all the Di

positive.

The quantities Di are the convergence rates of the Faybusovich flow, and thusmeasure the time required to reach the e-vicinity of the optimal vertex, where e isarbitrarily small:

TeBjlog ejDmin

; ð10Þ

where

Dmin ¼ mini

Di: ð11Þ

Therefore, if the optimal vertex is required with arbitrary precision e; then the

computation time (or complexity) is OðD1minjlog ejÞ:

In summary, if the Di are small then large computation times will be required. TheDi can be arbitrarily small when the inputs are real numbers, resulting in anunbounded computation time. However, we will show that in the probabilistic

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 479

Page 7: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

model, which we define in the next section ‘‘bad’’ instances are rare, and the flowperforms well ‘‘with high probability’’ (see Theorem 4.1 and Corollary 5.1).

3. The probabilistic model

We now define the ensemble of LP problems for which we analyze the complexity

of the Faybusovich flow. Denote by Nð0; s2Þ the standard Gaussian distribution

with 0 mean and variance s2: Consider an ensemble in which the components ofðA; b; cÞ are i.i.d. (independent identically distributed) random variables with the

distribution Nð0; s2Þ: The model will consist of the following set of problems:

LPM ¼fðA; b; cÞ j ðA; b; cÞ are i:i:d: variables with the distribution Nð0; s2Þ

and the LP problem has a bounded optimal solutiong: ð12Þ

Therefore, we use matrices with a distribution Nð0; s2Þ:

f ðAÞ ¼ 1

ZA

exp 1

2s2tr AT A

� �ð13Þ

with normalization

ZA ¼Z

dmnA exp 1

2s2trAT A

� �¼ ð2ps2Þmn=2: ð14Þ

Ensemble (13) factorizes into mn i.i.d. Gaussian random variables for each of thecomponents of A:The distributions of the vectors c and b are defined by

f ðcÞ ¼ 1

Zc

exp 1

2s2cT c

� �ð15Þ

with normalization

Zc ¼Z

dnc exp 1

2s2cT c

� �¼ ð2ps2Þn=2; ð16Þ

and

f ðbÞ ¼ 1

Zb

exp 1

2s2bT b

� �ð17Þ

with normalization

Zb ¼Z

dmb exp 1

2s2bT b

� �¼ ð2ps2Þm=2: ð18Þ

With the introduction of a probabilistic model of LP instances Dmin becomes arandom variable. We wish to compute the probability distribution of Dmin forinstances with a bounded solution, when Dmin40: We reduce this problem to the

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510480

Page 8: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

simpler task of computing PðDmin4DjDmin40Þ; in which the condition Dmin40 ismuch easier to impose than the condition that an instance produces an LP problemwith a bounded solution. This reduction is justified by the following lemma:

Lemma 3.1.

PðDmin4D j LP instance has a bounded maximum vertexÞ

¼ PðDmin4DjDmin40Þ: ð19Þ

Proof. Let ðA; b; cÞ be an LP instance chosen according to the probabilitydistributions (13), (15) and (17). There is a unique orthant (out of the 2n orthants)where the constraint set Ax ¼ b defines a nonempty polyhedron. This orthant is notnecessarily the positive orthant, as in the standard formulation of LP.Let us consider now any vertex of this polyhedron with basic and non-basic sets B

and N: Its m non-vanishing coordinates xB are given by solving ABxB ¼ b: Thematrix AB is full rank with probability 1; also, the components of xB are non-zeroand finite with probability 1. Therefore, in the probabilistic analysis we can assumethat xB is well defined and non-zero. With this vertex we associate the n m

quantities Di ¼ ðcNÞi þ ðcTBA1

B ANÞi; from (9).

We now show that there is a set of 2m equiprobable instances, which contains theinstance ðA; b; cÞ; that shares the same vector b and the same values of fDig; whencomputed according to the given partition. This set contains a unique instance withxB in the positive orthant. Thus, if Dmin40; the latter instance will be the uniquemember of the set which has a bounded optimal solution.To this end, consider the set RðxBÞ of the 2m reflections QlxB of xB; where Ql is an

m m diagonal matrix with diagonal entries 71 and l ¼ 1; 2;y; 2m:Given the instance ðA; b; cÞ and a particular partition into basic and non-basic sets,

we split A columnwise into ðAB;ANÞ and c into ðcB; cNÞ: Let S be the set of 2m

instances ððABQl ;ANÞ; b; ðQlcB; cNÞÞ where l ¼ 1;y; 2m: The vertices QlxB of theseinstances, which correspond to the prescribed partition, comprise the set RðxBÞ;since ðABQlÞðQlxBÞ ¼ b: Furthermore, all elements in RðxBÞ (each of which

corresponds to a different instance) have the same set of D’s, since Di ¼ ðcNÞi þ½ðQlcBÞT ðABQlÞ1AN�i: Because of the symmetry of the ensemble under the

reflections Ql ; the probability of all instances in S is the same.All the vertices belonging to RðxBÞ have the same Di’s with the same probability,

and exactly one is in the positive orthant. Thus, if Dmin40; the latter vertex is theunique element from S which is the optimal vertex of an LP problem with abounded solution. Consequently, the probability of having any prescribed set of Di’s,and in particular, the probability distribution for the Di’s given Dmin40; is notaffected by the event that the LP instance has a bounded optimal solution (i.e., thatthe vertex is in the positive orthant). In other words, these are independent events.Integration over all instances and taking this way into account all possible sets Swhile imposing the requirement fDmin4DjDmin40g results in (19). &

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 481

Page 9: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

The event Dmin40 corresponds to a specific partition of A into basic and non-basicsets B;N; respectively. It turns out that it is much easier to analytically calculate theprobability distribution of Dmin for a given partition of the matrix A: It will be shownin what follows that in the probabilistic model we defined, PðDmin4DjDmin40Þ isproportional to the probability that Dmin4D for a fixed partition. Let Wj be the

event that a partition j of the matrix A is an optimal partition, i.e. all Di are positive(j is an index with range 1;y; ðn

mÞ). Let the index 1 stand for the partition where B is

taken from the last m columns of A: We now show:

Lemma 3.2. Let D40 then

PðDmin4DjDmin40Þ ¼ PðDmin4DjW1Þ:

Proof. Given that Dmin40; there is a unique optimal partition since a non-uniqueoptimal partition occurs only if c is orthogonal to some face of the polyhedron, inwhich case Di ¼ 0 for some i: Thus we can write:

PðDmin4DjDmin40Þ ¼X

j

PðDmin4DjDmin40;WjÞPðWjÞ ð20Þ

¼X

j

PðDmin4DjWjÞPðWjÞ; ð21Þ

where the second equality holds since the event Wj is contained in the event that

Dmin40: The probability distribution of ðA; cÞ is invariant under permutations ofcolumns of A and c; and under permutations of rows of A: Therefore theprobabilities PðWjÞ are all equal, and so are PðDmin4DjDmin40;WjÞ; and the resultfollows. &

We define

Dmin 1 ¼ min fDi jDi are computed relative to the partition 1g: ð22Þ

Note that the definition of Dmin in Eq. (11) is relative to the optimal partition. Toshow that all computations can be carried out for a fixed partition of A we need thenext lemma:

Lemma 3.3. Let D40 then

PðDmin4DjDmin40Þ ¼ PðDmin 14DÞPðDmin 140Þ :

Proof. The result follows from

PðDmin4DjW1Þ ¼ PðDmin 14DjDmin 140Þ; ð23Þ

combined with the result of the previous lemma and the definition of conditionalprobability. &

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510482

Page 10: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

In view of the symmetry of the joint probability distribution (j.p.d.) ofD1;y;Dnm; given by (28) and (32), the normalization constant PðDmin 140Þsatisfies:

PðDmin 140Þ ¼ 1=2nm: ð24Þ

Remark 3.1. Note that we are assuming throughout this work, that the optimalvertex is unique, i.e., given a partition ðN;BÞ of A that corresponds to an optimalvertex, the basic components are all non-zero. The reason is that if one of thecomponents of the optimal vertex vanishes, all of its permutations with the n m

components of the non-basic set result in the same value of cT x: Vanishing of one ofthe components of the optimal vertex requires that b is a linear combination ofcolumns of A; that is an event of zero measure in our probabilistic ensemble.Therefore, this case will not be considered in the present work.

4. Computing the distributions of Dmin 1 and of Dmin

In the following we compute first the distribution of Dmin 1 and use it to obtain thedistribution of Dmin via Lemma 3.3. We denote the first n m components of c by y;and its last m components by z: In this notation equation (9) for Di takes the form:

Dp ¼ yp þ ðzT B1NÞp; p ¼ 1;y; n m: ð25Þ

Our notation will be such that indices

i; j; k;y range over 1; 2;y;m

and

p; q;y range over 1; 2;y; n m:

In this notation, ensembles (13) and (15) may be written as

f ðAÞ ¼ f ðN;BÞ ¼ 1

ZA

exp 1

2s2X

ij

B2ij þ

Xip

N2ip

!" #;

f ðcÞ ¼ f ðy; zÞ ¼ 1

Zc

exp 1

2s2X

i

z2i þX

p

y2p

!" #: ð26Þ

We first compute the joint probability distribution (j.p.d.) of D1;y;Dnm relativeto the partition 1. This is denoted by f1ðD1;y;DnmÞ: Using (25), we write

f1ðD1;y;DnmÞ ¼Z

dm2

B dmðnmÞN dmz dnmy

f ðN;BÞf ðy; zÞYnm

q¼1d Dq þ yq

Xm

i;j¼1zjðB1ÞjiNiq

!; ð27Þ

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 483

Page 11: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

where dðxÞ is the Dirac delta function. We note that this j.p.d. is not only completelysymmetric under permuting the Dp’s, but is also independent of the partition relative

to which it is computed.We would like now to perform the integrals in (27) and obtain a more explicit

expression for f1ðD1;y;DnmÞ: It turns out that direct integration over the yq’s,

using the d function, is not the most efficient way to proceed. Instead, we representeach of the d functions as a Fourier integral. Thus,

f1ðD1;y;DnmÞ ¼Z

dm2

B dmðnmÞN dmz dnmydnmlð2pÞnm f ðN;BÞf ðy; zÞ

exp iX

q

lq Dq þ yq Xm

i;j¼1zjðB1ÞjiNiq

!" #:

Integration over Nip; lq and yp is straightforward and yields

f1ðD1;y;DnmÞ ¼1

2ps2

� �m2þn2Z

dm2

B dmz

½zTðBT BÞ1z þ 1�nm2

exp 1

2s2X

ij

B2ij þ

Xi

z2i þP

p D2p

zT ðBT BÞ1z þ 1

!" #:

ð28Þ

Here the complete symmetry of f1ðD1;y;DnmÞ under permutations of the Dp’s is

explicit, since it is a function ofP

p D2p:

The integrand in (28) contains the combination

uðB; zÞ ¼ 1

zTðBT BÞ1z þ 1: ð29Þ

Obviously, 0puðB; zÞp1: It will turn out to be very useful to consider thedistribution function PðuÞ of the random variable u ¼ uðB; zÞ; namely,

PðuÞ ¼ 1

2ps2

� �m2þm2Z

dm2

B dmz e12s2ðtr BT BþzT zÞ � d u 1

zTðBT BÞ1z þ 1

!:

ð30Þ

Note from (29) that uðlB; lzÞ ¼ uðB; zÞ: Thus, in fact, PðuÞ is independent ofthe (common) variance s of the Gaussian variables B and z; and we might as wellrewrite (30) as

PðuÞ ¼ lp

� �m2þm2Z

dm2

B dmz elðtr BT BþzT zÞ � d u 1

zTðBT BÞ1z þ 1

!; ð31Þ

with l40 an arbitrary parameter.

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510484

Page 12: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

Thus, if we could calculate PðuÞ explicitly, we would be able to express the j.p.d.f1ðD1;y;DnmÞ in (28) in terms of the one-dimensional integral

f1ðD1;y;DnmÞ ¼1

2ps2

� �nm2Z

N

0

du PðuÞunm2 exp 1

2s2uXnm

p¼1D2

p

!" #;

ð32Þ

as can be seen by comparing (28) and (30).In this paper we are interested mainly in the minimal D: Thus, we need fmin 1ðDÞ;

the probability density of Dmin 1: Due to the symmetry of f1ðD1;y;DnmÞ; which isexplicit in (32), we can express fmin 1ðDÞ simply as

fmin 1ðDÞ ¼ ðn mÞZ

N

DdD2 ydDnm f1ðD;D2;y;DnmÞ: ð33Þ

It will be more convenient to consider the complementary cumulative distribution(c.c.d.)

QðDÞ ¼ PðDmin 14DÞ ¼Z

N

Dfmin 1ðuÞ du; ð34Þ

in terms of which

fmin 1ðDÞ ¼ @

@DQðDÞ: ð35Þ

The c.c.d. QðDÞ may be expressed as a symmetric integral

QðDÞ ¼Z

N

DdD1 y dDnm f1ðD1;D2;y;DnmÞ ð36Þ

over the D’s, and thus, it is computationally a more convenient object to considerthan fmin 1ðDÞ:From (36) and (32) we obtain that

QðDÞ ¼ 1

2ps2

� �nm2Z

N

0

du PðuÞffiffiffiu

p ZN

Ddv e

12s2 uv2

� �nm

; ð37Þ

and from (37) one readily finds that

Qð0Þ ¼ 1

2nm; ð38Þ

(as well as QðNÞ ¼ 1; by definition of Q).Then, use of the integral representation

1 erfðxÞ ¼ erfcðxÞ ¼ 2ffiffiffip

pZ

N

x

dv ev2 ðx40Þ; ð39Þ

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 485

Page 13: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

and (38) leads (for D40) to

QðDÞ ¼ Qð0ÞZ

N

0

du PðuÞ erfc D

ffiffiffiffiffiffiffiu

2s2

r �� �nm

: ð40Þ

This expression is an exact integral representation of QðDÞ (in terms of the yetundetermined probability distribution PðuÞ).In order to proceed, we have to determine PðuÞ: Determining PðuÞ for any pair of

integers ðn;mÞ in (31) in a closed form is a difficult task. However, since we areinterested mainly in the asymptotic behavior of computation times, we will contendourselves in analyzing the behavior of PðuÞ as n;m-N; with

r � m=no1 ð41Þheld fixed.We were able to determine the large n;m behavior of PðuÞ (and thus of

f1ðD1;D2;y;DnmÞ and QðDÞ) using standard methods [9] (Some papers that treatrandom real rectangular matrices, such as the matrices relevant for this work, see [3].For earlier work see G.M. Cicuta et al.) of random matrix theory [22].This calculation is presented in detail in Appendix B. We show there (see

Eq. (B.26)) that the leading asymptotic behavior of PðuÞ is

PðuÞ ¼ffiffiffiffiffiffiffiffim

2pu

re

mu2 ; ð42Þ

namely,ffiffiffiu

pis simply a Gaussian variable, with variance proportional to 1=

ffiffiffiffim

p:

Note that (42) is independent of the width s; which is consistent with the remarkpreceding (31).Substituting (42) in (32), we obtain, with the help of the integral representation

GðzÞ ¼Z

N

0

tz1et dt ð43Þ

of the G function, the large n;m behavior of the j.p.d. f1ðD1;y;DnmÞ as

f1ðD1;y;DnmÞ ¼ffiffiffiffim

psG

n m þ 1

2

� �1

p1

ms2 þP

p D2p

!nmþ12

: ð44Þ

Thus, the D’s follow asymptotically a multi-dimensional Cauchy distribution. It canbe checked that (44) is properly normalized to 1.

Similarly, by substituting (42) in (40), and changing the variable to y ¼ffiffiffiffiffiffiffiffiffiffiffimu=2

p;

we obtain the large n;m behavior of QðDÞ as

QðDÞ ¼ 2Qð0Þffiffiffip

pZ

N

0

dy ey2 erfc Dyffiffiffiffim

ps

�� �nm

: ð45Þ

As a consistency check of our large n;m asymptotic expressions, we have verified,with the help of (43), that substituting (44) into (36) leads to (40), with PðuÞ theregiven by the asymptotic expression (42).We are interested in the scaling behavior of QðDÞ in (45) in the limit n;m-N: In

this large n;m limit, the factor ðerfc½D yffiffiffim

ps�Þ

nm in (45) decays rapidly to zero. Thus,

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510486

Page 14: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

the integral in (45) will be appreciably different from zero only in a small regionaround D ¼ 0; where the erfc function is very close to 1. More precisely, using

erfc x ¼ 1 2xffiffip

p þ Oðx2Þ; we may expand the erfc term in (45) as

erfc Dyffiffiffiffim

ps

�� �nm

¼ 1 2yDffiffiffiffiffiffiffiffiffiffiffipms2

p þ?� �nm

ð46Þ

(due to the Gaussian damping factor in (45), this expansion is uniform in y). Thus,we see that QðDÞ=Qð0Þ will be appreciably different from zero only for values of D=sof the order up to 1=

ffiffiffiffim

p; for which (46) exponentiates into a quantity of Oð1Þ; and

thus

QðDÞC2Qð0Þffiffiffip

pZ

N

0

dy ey2 exp 2ffiffiffip

p n

m 1

� �yd

� �; ð47Þ

where

d ¼ffiffiffiffim

pD

sð48Þ

is Oðm0Þ: Note that m=n is kept finite and fixed. The integral in (47) can be done, andthus we arrive at the explicit scaling behavior of the c.c.d.

QðDÞ ¼ Qð0Þ ex2D erfcðxDÞ; ð49Þ

where

xD ¼ ZDðn;mÞD; ð50Þ

with

ZDðn;mÞ ¼ 1ffiffiffip

p n

m 1

� � ffiffiffiffim

p

s: ð51Þ

The c.c.d. QðDÞ depends, in principle, on all the three variables n;m and D: The result(49) demonstrates, that in the limit ðn;mÞ-N (with r ¼ m=n held finite and fixed),QðDÞ is a function only of one scaling variable: the xD defined in (50).We have compared (49) and (50) against results of numerical simulations, for

various values of n=m: The results are shown in Figs. 2 and 3 in Section 8.Establishing the explicit scaling expression of the probability distribution of the

convergence rate constitutes the main result in our paper, which we summarize bythe following Theorem:

Theorem 4.1. Assume that LP problems of the form (2), with the instances distributed

according to (13)–(18), are solved by the Faybusovich algorithm (5). Then, in the

asymptotic limit n-N; m-N with 0or ¼ m=no1 kept fixed, the convergence rate

Dmin defined by (11) is distributed according to

PðDmin4Djbounded optimal solutionÞ ¼ ex2D erfcðxDÞ; ð52Þ

where xD is given by (50).

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 487

Page 15: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

Proof. QðDÞ ¼ PðDmin 14DÞ by (34). Therefore, use of (24) and (38), namely,

PðDmin 140Þ ¼ Qð0Þ ¼ 1=2nm

and of (49) implies

PðDmin 14DÞ ¼ 1

2nmex2D erfcðxDÞ; ð53Þ

but according to Lemma 3.3,

PðDmin4DjDmin40Þ ¼ PðDmin 14DÞPðDmin 140Þ :

Finally, substituting (53) and (24) in the last equation, and use of Lemma 3.1, leadsto the statement of the theorem. &

From (49) and (50), we can obtain the probability density fmin 1ðDÞ of Dmin 1; using(35). In particular, we find

fmin 1ð0Þ ¼2ffiffiffiffim

p

psn

m 1

� �Qð0Þ; ð54Þ

which coincides with the expression one obtains for fmin 1ð0Þ by directly substitutingthe large ðn;mÞ expression (45) into (35), without first going to the scaling regime

DB1=ffiffiffiffim

p; where (49) holds.

5. High-probability behavior

In this paper we show that the Faybusovich vector field performs well with highprobability, a term that is explained in what follows. Such an analysis was carriedout for interior point methods e.g. in [23,34]. When the inputs of an algorithm have aprobability distribution, Dmin becomes a random variable. High probability behavioris defined as follows:

Definition 5.1. Let Tn be a random variable associated with problems of size n: Wesay that TðnÞ is a high probability bound on Tn if for n-N TnpTðnÞ withprobability one.

To show that 1=DminoZðmÞ with high probability is the same as showing that

Dmin41=ZðmÞ with high probability. Let fðmÞmin ðDjDmin40Þ denote the probability

density of Dmin given Dmin40: The m superscript is a mnemonic for its dependenceon the problem size. We make the following observation:

Lemma 5.1. Let PðDmin4xjDmin40Þ be analytic in x around x ¼ 0: Then,

Dmin4½f ðmÞmin ð0jDmin40ÞgðmÞ�1 with high-probability, where gðmÞ is any function such

that limm-N gðmÞ ¼ N:

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510488

Page 16: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

Proof. For very small x we have:

PðDmin4xjDmin40ÞE1 fðmÞmin ð0jDmin40Þx: ð55Þ

We look for x ¼ xðmÞ such that PðDmin4xðmÞjDmin40Þ ¼ 1 with high probability.For this it is sufficient that

limm-N

fðmÞmin ð0jDmin40ÞxðmÞ ¼ 0: ð56Þ

This holds if

xðmÞ ¼ ½f ðmÞmin ð0jDmin40ÞgðmÞ�1; ð57Þ

where gðmÞ is any function such that limm-N gðmÞ ¼ N: &

The growth of gðmÞ can be arbitrarily slow, so from this point on we will ignorethis factor.As a corollary to Theorem 4.1 and (54) we now obtain:

Corollary 5.1. Let ðA; b; cÞ be linear programming instances distributed according to

(12) then

1

Dmin¼ Oðm1=2Þ and Te ¼ Oðm1=2Þ ð58Þ

with high probability.

Proof. According to the results of Section 4, (and more explicitly, from the

derivation of (86) in Section 7), fðmÞmin ð0jDmin40ÞBm1=2; and the result follows from

Lemma 5.1 and the definition of Te (Eq. (10)). &

Remark 5.1. Note that bounds obtained in this method are tight, since they arebased on the actual distribution of the data.

Remark 5.2. Note that fðmÞmin ð0jDmin40Þa0: Therefore, the 1

D moment of the

probability density function fðmÞmin ðDjDmin40Þ does not exist.

6. Measures of complexity in the non-asymptotic regime

In some situations one wants to identify the optimal vertex with limited precision.The term

biðtÞ ¼ Xm

j¼1aji log

xjþnmðtÞxjþnmð0Þ

ð59Þ

in (8), when it is positive, is a kind of ‘‘barrier’’: Dit in Eq. (8) must be larger than thebarrier before xi can decrease to zero.

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 489

Page 17: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

In this section we discuss heuristically the behavior of the barrier biðtÞ as thedynamical system flows to the optimal vertex. To this end, we first discuss in thefollowing sub-section some relevant probabilistic properties of the vertices ofpolyhedrons in our ensemble.

6.1. The typical magnitude of the coordinates of vertices

Flow (5) conserves the constraint Ax ¼ b in (2). Let us split these equa-tions according to the basic and non-basic sets which corresponding to an arbitrary

vertex as

ABxB þ ANxN ¼ b: ð60Þ

Precisely at the vertex in question xN ¼ 0; of course. However, we may be interestedin the vicinity of that vertex, and thus leave xN arbitrary at this point.We may consider (60) as a system of equations in the unknowns xB with

parameters xN; with coefficients AB;AN and b drawn from the equivariant Gaussianensembles (13), (14), (17) and (18). Thus, the components of xB (e.g., the xjþnmðtÞ’sin (59) if we are considering the optimal vertex) are random variables. The jointprobability density for the m random variables xB is given by Theorem 4.2 of [15](applied to the particular Gaussian ensembles (13), (14), (17) and (18)) as

PðxB; xNÞ ¼Gðmþ1

pmþ12

l

ðl2 þ xTBxBÞ

mþ12

; ð61Þ

where

l ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1þ xT

NxN

q: ð62Þ

(Strictly speaking, we should constrain xB to lie in the positive orthant, and thusmultiply (61) by a factor 2m to keep it normalized. However, since these details donot affect our discussion below, we avoid introducing them.)It follows from (61) that the components of xB are identically distributed, with

probability density of any one of the components xBj ¼ z given by

pðz; xNÞ ¼ 1

pl

l2 þ z2ð63Þ

in accordance with a general theorem due to Girko [16].The main object of the discussion in this sub-section is to estimate the typical

magnitude of the m components of xB: One could argue that typically all m

components jxBjjol; since the Cauchy distribution (63) has width l: However, from(63) we have that Probðjzj4lÞ ¼ 1=2; namely, jxBjjol and jxBjj4l occur with equalprobability. Thus, one has to be more careful, and the answer lies in the probability

density function for R ¼ffiffiffiffiffiffiffiffiffiffiffiffixTBxB

p:

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510490

Page 18: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

From (61), we find that the probability density function for R ¼ffiffiffiffiffiffiffiffiffiffiffiffixTBxB

ptakes the

form

PðjxBj ¼ RÞ ¼ 2ffiffiffip

p Gðmþ12Þ

Gðm2Þ1

lðRlÞ

m1

½1þ ðRlÞ

2�mþ12

: ð64Þ

For a finite fixed value of m; this expression vanishes as ðR=lÞm1 for R5l; attainsits maximum at

R

l

� �2

¼ m 1

2; ð65Þ

and then and decays like l=R2 for Rbl: Thus, like the even Cauchy distribution(63), it does not have a second moment.In order to make (64) more transparent, we introduce the angle y defined by

tan yðRÞ ¼ R

l; ð66Þ

where 0pypp=2: In terms of y we have

PðjxBj ¼ RÞ ¼ 2ffiffiffip

p Gðmþ12Þ

Gðm2Þ1

lcos2 y sinm1 y: ð67Þ

(In order to obtain the probability density for y we have to multiply the latter

expression by a factor dR=dy ¼ l=cos2 y:)Let us now concentrate on the asymptotic behavior of (67) (or (64)) in the limit

m-N: Using Stirling’s formula

GðxÞBffiffiffiffiffiffi2px

rxx ex ð68Þ

for the large x asymptotic behavior of the Gamma functions, we obtain for m-N

PðjxBj ¼ RÞBffiffiffiffiffiffiffi2m

pl2

rcos2 y sinm1 y: ð69Þ

Clearly, (69) is exponentially small in m; unless sin yC1; which implies

y ¼ p=2 d ð70Þ

with dB1=ffiffiffiffim

p: Thus, writing

d ¼ffiffiffiffiffi2u

m

rð71Þ

(with u5m), we obtain, for m-N;

PðjxBj ¼ RÞBffiffiffiffiffiffiffiffiffiffiffi8

pml2

rueu: ð72Þ

In this regime

R

l¼ tan yC

ffiffiffiffiffim

2u

rb1: ð73Þ

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 491

Page 19: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

The function on the r.h.s. of (72) has its maximum at u ¼ 1; i.e., at R=l ¼ffiffiffiffiffiffiffiffiffim=2

p(in

accordance with (65)) and has width of Oð1Þ around that maximum. However, this isnot enough to deduce the typical behavior of R=l; since as we have already

commented following (65), PðjxBj ¼ RÞ has long tails and decays like l=R2 past itsmaximum. Thus, we have to calculate the probability that R4R0 ¼ l tan y0; givenR0: The calculation is straight forward: using (69) and (66) we obtain

ProbðR4R0Þ ¼Z

N

R0

PðRÞ dR ¼ffiffiffiffiffiffiffi2m

p

r Z p2

y0sinm1 y: ð74Þ

Due to the fact that in the limit m-N; sinm1ymay be approximated by a Gaussiancentered around y ¼ p=2 with variance 1=m; it is clear that

ProbðR4R0Þ ¼ Probðy4y0ÞC1;

unless d0 ¼ p=2 y0Bffiffiffiffiffiffiffiffiffiffiffiffiffi2u0=m

p; with u05m: Thus, using (70) and (71) we obtain

ProbðR4R0Þ ¼ffiffiffiffiffiffiffi2m

p

r Z d0

0

cosm1 dB1ffiffiffip

pZ u0

0

eu duffiffiffiu

p ¼ erfð ffiffiffiffiffiu0

p Þ: ð75Þ

Finally, using the definitions of u0; y0 and R0; we rewrite (75) as

ProbðR4R0Þ ¼ erf

ffiffiffiffim

2

rarctan

lR0

� � �: ð76Þ

From the asymptotic behavior erfðxÞB1 ex2=xffiffiffip

pat large x; we see that

ProbðR4R0Þ saturates at 1 exponentially fast as R0 decreases. Consequently,

1 ProbðR4R0ÞBOðm0Þ is not negligible only if R0=l is large enough, namely,ffiffiffim2

parctanð l

R0Þp1; i.e., R0=lX

ffiffiffiffiffiffiffiffiffim=2

p: If R0=l is very large, namely, R0=lb

ffiffiffiffiffiffiffiffiffim=2

p;

which corresponds to a small argument of the error function in (76), where we clearly

have ProbðR4R0ÞCffiffiffiffiffiffiffiffiffiffiffiffi2m=p

pðl=R0Þ51: From these properties of (76) it thus follows

that typically

R

lBOð

ffiffiffiffim

pÞ: ð77Þ

Up to this point, we have left the parameters xN unspecified. At this point weselect the prescribed vertex of the polyhedron. At the vertex itself, xN ¼ 0:Therefore, from (62), we see that l ¼ 1: Thus, according to (77), at the vertex,typically

RvertexBOðffiffiffiffim

pÞ: ð78Þ

This result obviously holds for any vertex of the polyhedron: any partition (60) of thesystem of equations Ax ¼ b into basic and non-basic sets leads to the samedistribution function (61), and at each vertex we have xN ¼ 0:Thus, clearly, this means that the whole polyhedron is typically bounded inside an

n-dimensional sphere of radius RBOðffiffiffiffim

pÞ centered at the origin.

Thus, from (78) and from the rotational symmetry of (61), we conclude that anycomponent of xB at the optimal vertex, or at any other vertex (with its appropriate

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510492

Page 20: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

basic set B), is typically of OðRvertex=ffiffiffiffim

pÞ ¼ Oð1Þ (and of course, positive). Points on

the polyhedron other than the vertices are weighted linear combinations of thevertices with positive weights which are smaller than unity, and as such also havetheir individual components typically of Oð1Þ:

6.2. Non-asymptotic complexity measures from bi

Applying the results of the previous subsection to the optimal vertex, we expect thecomponents of xBðtÞ (i.e., the xjþnmðtÞ’s in (59)) to be typically of the same order ofmagnitude as their asymptotic values limt-N xBðtÞ at the optimal vertex, and as aresult, we expect the barrier biðtÞ to be of the same order of magnitude as itsasymptotic value limt-N biðtÞ:Note that, for this reason, in order to determine how the xiðtÞ in (8) tend to zero,

to leading order, we can safely replace all the xjþnmðtÞ by their asymptotic values inx�B: Thus, in the following we approximate the barrier (59) by its asymptotic value

bi ¼ limt-N

Xm

j¼1aji log xjþnmðtÞ ¼

Xm

j¼1aji log x�

jþnm; ð79Þ

where we have also ignored the contribution of the initial condition.We now consider the convergence time of the solution xðtÞ of (5) to the optimal

vertex. In order for xðtÞ to be close to the maximum vertex we must have xiðtÞoe fori ¼ 1;y; n m for some small positive e: The time parameter t must then satisfy:

expðDit þ biÞoe for i ¼ 1;y; n m: ð80Þ

Solving for t; we find an estimate for the time required to flow to the vicinity of theoptimal vertex as

t4bi

Di

þ jlog ejDi

for all i ¼ 1;y;m: ð81Þ

We define

T ¼ maxi

bi

Di

þ jlog ejDi

� �; ð82Þ

which we consider as the computation time. We denote

bmax ¼ maxi

bi: ð83Þ

In the limit of asymptotically small e; the first term in (82) is irrelevant, and thedistribution of computation times is determined by the distribution of the Di’s statedby Theorem 4.1.If the asymptotic precision is not required, the first term in (82) may be dominant.

To bound this term in the expression for the computation time we can use thequotient bmax=Dmin; where Dmin is defined in (11).In the probabilistic ensemble used in this work bmax and bmax=Dmin are random

variables, as is Dmin:Unfortunately, we could not find the probability distributions ofbmax and bmax=Dmin analytically as we did for Dmin: In the following section, a

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 493

Page 21: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

conjecture concerning these distributions, based on numerical evidence, will beformulated.

7. Scaling functions

In Section 4 it was shown that in the limit of large ðn;mÞ the probability

PðDmin4DjDmin40Þ is given by (52). Consequently, PðDminoDjDmin40Þ �Fðn;mÞðDÞ is of the scaling form

Fðn;mÞðDÞ ¼ 1 ex2D erfcðxDÞ � FðxDÞ: ð84Þ

Such a scaling form is very useful and informative, as we will demonstrate in whatfollows. The scaling function F contains all asymptotic information on D: In

particular, one can extract the problem size dependence of fðmÞmin ð0jDmin40Þ which is

required for obtaining a high probability bound using Lemma 5.1. (This has alreadybeen shown in Corollary 5.1.) We use the scaling form, Eq. (84), leading to,

fðmÞmin ð0jDmin40Þ ¼ dFðn;mÞðDÞ

dD

�����D¼0

¼ ZDðn;mÞFðxDÞdxD

����xD¼0

: ð85Þ

This is just fmin 1ð0Þ=Qð0Þ: With the help of Lemma 5.1, leading to (58) and our

finding that Zðn;mÞBffiffiffiffim

p; we conclude that with high probability

1

Dmin¼ Oð

ffiffiffiffim

pÞ: ð86Þ

The next observation is that the distribution FðxDÞ is very wide. For large xD it

behaves as 1 1ffiffip

pxD; as is clear from the asymptotic behavior of the erfc function.

Therefore it does not have a mean. Since at xD ¼ 0 the slope dF=dxDjxD¼0 does not

vanish, also 1=xD does not have a mean (see Remark 5.2).We would like to derive scaling functions like (84) also for the barrier bmax; that is

the maximum of the bi defined by (79) and for the computation time T defined by(82). The analytic derivation of such scaling functions is difficult and therefore leftfor further studies. Their existence is verified numerically in the next section. Inparticular for fixed r ¼ m=n; we found that

P1

bmaxo1

b

� �� F

ðn;mÞ1=bmax

1

b

� �¼ F1=bðxbÞ ð87Þ

and

P1

To1

t

� �� F

ðn;mÞ1=T

1

t

� �¼ F1=TðxTÞ; ð88Þ

where bmax and T are the maximal barrier and computation time. The scalingvariables are

xb ¼ Zbðn;mÞ 1b

ð89Þ

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510494

Page 22: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

and

xT ¼ ZTðn;mÞ 1t: ð90Þ

The asymptotic behavior of the scaling variables was determined numerically to be

Zbðn;mÞBm ð91Þ

and

ZT ðn;mÞBm log m: ð92Þ

This was found for constant r: The precise r dependence could not be determinednumerically. The resulting high probability behavior for the barrier and computationtime is therefore:

bmax ¼ OðmÞ; T ¼ Oðm log mÞ: ð93Þ

Note that scaling functions, such as these, immediately provide the averagebehavior as well (if it exists).Here, in the calculation of the distribution of computation times it was assumed

that these are dominated by the barriers rather than by j log ej in (82). The results(87), (88) and (93) are conjectures supported by the numerical calculations of thenext section.

8. Numerical simulations

In this section the results of numerical simulations for the distributions of LPproblems are presented. For this purpose we generated full LP instances ðA; b; cÞwith the distribution (12). For each instance the LP problem was solved using thelinear programming solver of the IMSL C library. Only instances with a boundedoptimal solution were kept, and Dmin was computed relative to the optimal partitionand optimality was verified by checking that Dmin40: Using the sampled instances

we obtain an estimate of Fðn;mÞðDÞ ¼ PðDminoDjDmin40Þ; and of the correspond-ing cumulative distribution functions of the barrier bmax and the computationtime.As a consistency verification of the calculations we compared PðDmino

DjDmin40Þ; to PðDmin 1oDjDmin 140Þ that was directly estimated from thedistribution of matrices. For this purpose, we generated a sample of A and c

according to the probability distributions (13), (15) with s ¼ 1 and computed foreach instance the value of Dmin 1 (the minimum over Di) for a fixed partition of A intoðN;BÞ: We kept only the positive values (note that the definition of Dmin 1 does notrequire b). The two distributions are compared in Fig. 1, with excellent agreement.Note that estimation of PðDmin 1oDjDmin 140Þ by sampling from a fixed partition

is infeasible for large m and n; since for any partition of A the probability that Dmin 1

is positive is 2ðnmÞ (Eq. (24)). Therefore, the equivalence between the probabilitydistributions of Dmin and Dmin 1 cannot be exploited for producing numerical

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 495

Page 23: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

estimates of the probability distribution of Dmin: Thus we proceed by generating fullLP instances, and solving the LP problem as described above.The problem size dependence was explored while keeping the ratio n=m fixed or

while keeping m fixed and varying n: In Fig. 2, we plot the numerical estimates of

Fðn;mÞðDÞ for varying problem sizes with n=m ¼ 2 and compare it with the analyticalresult, Eq. (84). The agreement with the analytical result improves as m is increased,since it is an asymptotic result. The simulations show that the asymptotic result holdswell even for m ¼ 20: As in the analytical result, in the large m limit we observe that

Fðn;mÞðDÞ is not a general function of n; m and D; but a scaling function of the form

Fðn;mÞðDÞ ¼ FðxDÞ as predicted theoretically in Section 7 (see (84) there). The

scaling variable xDðmÞ is given by (50). Indeed, Fig. 3 demonstrates that Fðn;mÞ hasthis form as predicted by Eq. (84) with the scaling variable xD:For the cumulative distribution functions of the barrier bmax and of the

computation time T we do not have analytical formulas. These distribution

functions are denoted by Fðn;mÞ1=bmax

and Fðn;mÞ1=T

; respectively. Their behavior near zero

enables to obtain high probability bounds on bmax and T ; since for this purpose weneed to bound the tails of their distributions, or alternatively, estimate the density of1=bmax and 1=T at 0. In the numerical estimate of the barrier we collected onlypositive values, since only these contribute to prolonging the computation time.

From Fig. 4 we find thatFðn;mÞ1=bmax

is indeed a scaling function of the form (87) with the

scaling variable xb of (89). The behavior of the computation time is extracted from

Fig. 5. The cumulative function Fðn;mÞ1=T

is found to be a scaling function of the form

(88) with the scaling variable xT of (90). The scaling variables xb and xT were found

ARTICLE IN PRESS

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

F(∆

)

Fig. 1. Comparison of PðDmin 1oDjDmin 140Þ and PðDminoDjDmin40Þ for m ¼ 2; n ¼ 4:

A. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510496

Page 24: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

numerically by the requirement that in the asymptotic limit the cumulativedistribution F approaches a scaling form. Such a fitting is possible only if a scalingform exists. We were unable to determine the dependence of the scaling variables xb

and xT on n=m:

ARTICLE IN PRESS

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F(n

,m)

m=4

m=20

m=40m=80

m=120

F from simulationsF from analytical formula

Fig. 2. Fðn;mÞðDÞ for m ¼ 4; 20; 40; 80; 120; n ¼ 2m: The number of instances was 105; 105; 40 000;

15 000; 5800; respectively. There is very good agreement with the analytical results, improving as m

increases.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F

x∆

F from analytical formulam=4m=20m=40m=80m=120

Fig. 3. FðxDÞ is plotted against the variable xD; for the same data as Fig. 2.

A. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 497

Page 25: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

9. Summary and discussion

In this paper we computed the problem size dependence of the distributions ofparameters that govern the convergence of a differential equation (Eq. (5)) thatsolves the linear programming problem [14]. To the best of our knowledge, this is thefirst time such distributions are computed. In particular, knowledge of the

ARTICLE IN PRESS

0 5 10 150

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F1/

β

m=4

m=4m=20m=40m=80m=120

Fig. 4. F1=bðxbÞ as a function of the variable xb ¼ m=bmax for the same instances as Fig. 2.

0 5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F1/

T

xT

m=4

m=4m=20m=40m=80m=120

Fig. 5. F1=T ðxT Þ as a function of the variable xT ¼ m logm=T for the same instances as Fig. 2.

A. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510498

Page 26: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

distribution functions enables to obtain the high probability behavior (for example(86) and (93)), and the moments (if these exist).The main result of the present work is that the distribution functions of the

convergence rate, Dmin; the barrier bmax and the computation time T are scalingfunctions; i.e., in the asymptotic limit of large ðn;mÞ; each depends on the problemsize only through a scaling variable. These functions are presented in Section 7.The scaling functions obtained here provide all the relevant information about the

distribution in the large ðn;mÞ limit. Such functions, even if known only numerically,can be useful for the understanding of the behavior for large values of ðn;mÞ that arebeyond the limits of numerical simulations. In particular, the distribution function ofDmin was calculated analytically and stated as Theorem 4.1. The relevance of theasymptotic theorem for finite and relatively small problem sizes ðn;mÞ wasdemonstrated numerically. It turns out to be a very simple function (see (84)). Thescaling form of the distributions of bmax and of T was conjectured on the basis ofnumerical simulations.The Faybusovich flow [6] that is studied in the present work, is defined by a

system of differential equations, and it can be considered as an example of theanalysis of convergence to fixed points for differential equations. One shouldnote, however, that the present system has a formal solution (8), and therefore it isnot typical.If we require knowledge of the attractive fixed points with arbitrarily high

precision (i.e., e of (80) and (82) can be made arbitrarily small), the convergence timeto an e-vicinity of the fixed point is dominated by the convergence rate Dmin: Thebarrier, that describes the state space ‘‘landscape’’ on the way to fixed points, isirrelevant in this case. Thus, in this limit, the complexity is determined by (86). Thispoint of view is taken in [28].However, for the solution of some problems (like the one studied in the present

work), such high precision is usually not required, and also the non-asymptoticbehavior (in e) of the vector field, as represented by the barrier, has an importantcontribution to the complexity of computing the fixed point.For computational models defined on the real numbers, worst case behavior can

be ill defined and lead to infinite computation times, in particular for interior pointmethods for linear programming [7]. Therefore, we compute the distribution ofcomputation times for a probabilistic model of linear programming instances ratherthan an upper bound. Such probabilistic models can be useful in giving a generalpicture also for traditional discrete problem solving, where the continuum theory canbe viewed as an approximation.A question of fundamental importance is how general is the existence of scaling

distributions. Their existence would be analogous to the central limit theorem [17]and to scaling in critical phenomena [32] and in Anderson localization [1,2].Typically, such functions are universal. In the case of the central limit theorem, forexample, under some very general conditions one obtains a Gaussian distribution,irrespectively of the original probability distributions. Moreover it depends on therandom variable and the number of the original variables via a specific combination.The Gaussian distribution is a well known example of the so-called stable probability

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 499

Page 27: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

distributions. In the physical problems mentioned above scaling and universalityreflect the fact that the systems becomes scale invariant.A specific challenging problem still left unsolved in the present work is the

rigorous calculation of the distributions of 1=bmax and of 1=T ; that is proving theconjectures concerning these distributions. This will be attempted in the near future.

Acknowledgments

It is our great pleasure to thank Arkadi Nemirovski, Eduardo Sontag and OferZeitouni for stimulating and informative discussions. This research was supported inpart by the US–Israel Binational Science Foundation (BSF), by the Israeli ScienceFoundation Grant Number 307/98 (090-903), by the US National ScienceFoundation under Grant No. PHY99-07949, by the Minerva Center of NonlinearPhysics of Complex Systems and by the fund for Promotion of Research at theTechnion.

Appendix A. The Faybusovich vector field

In the following we consider the inner product /x; ZSX1 ¼ xT X1Z: This innerproduct is defined on the positive orthant Rn

þ ¼ fxARn: xi40; i ¼ 1;y; ng; whereit defines a Riemannian metric. In the following we denote by ai; i ¼ 1;y;m therows of A: The Faybusovich vector field is the gradient of h relative to this metricprojected to the constraint set [14]. It can be expressed as

grad h ¼ Xc Xm

i¼1ziðxÞXai; ðA:1Þ

where z1ðxÞ;y; zmðxÞ make the gradient perpendicular to the constraint vectors, i.e.A grad h ¼ 0; so that Ax ¼ b is maintained by the dynamics. The resulting flow is

dx

dt¼ FðxÞ ¼ grad h: ðA:2Þ

Consider the functions

CiðxÞ ¼ logðxiÞ þXm

j¼1aji logðxjþnmÞ; i ¼ 1;y; n m: ðA:3Þ

The Ci are defined such that their equations of motion are easily integrated. Thisgives n m equations which correspond to the n m independent variables of theLP problem. To compute the time derivative of Ci we first find:

rCi ¼1

xi

ei þXm

j¼1

aij

xjþnm

e jþnm; ðA:4Þ

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510500

Page 28: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

and note that the vectors mi defined in Eq. (6) have the following property:

/mi; a jS ¼ 0; i ¼ 1;y; n m; j ¼ 1;y;m:

Therefore,

’CiðxÞ ¼/rCiðxÞ; ’xS ¼ /rCiðxÞ; grad hS

¼ mi; c Xm

j¼1zjðxÞa j

* +

¼/mi; cS � Di: ðA:5Þ

This equation is integrated to yield:

xiðtÞ ¼ xið0Þ exp Dit Xm

j¼1aij log

xjþnmðtÞxjþnmð0Þ

!: ðA:6Þ

Appendix B. The probability distribution PðuÞ

In this appendix, we study the probability distribution function

PðuÞ ¼ lp

� �m2þm2Z

dm2

B dmz elðtr BT BþzT zÞ � d u 1

zTðBT BÞ1z þ 1

!;

defined in (30) and (31) and calculate it in detail explicitly, in the large n;m limit.We will reconstruct PðuÞ from its moments. The Nth moment

kN ¼Z

N

0

du PðuÞuN

¼ lp

� �m2þm2Z

dm2

B dmz elðtr BT BþzT zÞ 1

zTðBT BÞ1z þ 1

!N

ðB:1Þ

of PðuÞ may be conveniently represented as

kN ¼ lp

� �m2þm2 1

GðNÞ

ZN

0

tN1et dt

Zdm2

B el tr BT B

Zdmz ezT ðlþ t

BT BÞz

¼ lp

� �m2

2 1

GðNÞ

ZN

0

tN1et dt

Zdm2

Bel tr BT Bffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidetð1þ t=l

BT BÞ

q ; ðB:2Þ

where in the last step we have performed Gaussian integration over z:Recall that PðuÞ is independent of the arbitrary parameter l (see the remark

preceding (31)). Thus, its Nth moment kN must also be independent of l; which ismanifest in (B.2). Therefore, with no loss of generality, and for later convenience, we

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 501

Page 29: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

will henceforth set l ¼ m (since we have in mind taking the large m limit). Thus,

kN ¼ m

p

� �m2

2 1

GðNÞ

ZN

0

tN1et dt

Zdm2

Bem tr BT Bffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidetð1þ t=m

BT BÞ

q¼ 1

GðNÞ

ZN

0

tN1et ct

m

� �dt; ðB:3Þ

where we have introduced the function

cðyÞ ¼ m

p

� �m2

2Z

dm2

Bem tr BT Bffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidetð1þ y

BT BÞ

q : ðB:4Þ

Note that

cð0Þ ¼ 1: ðB:5Þ

The function cðyÞ is well-defined for yX0; where it clearly decreases monotonically

c0ðyÞo0: ðB:6Þ

We would like now to integrate over the rotational degrees of freedom in dB: Anyreal m m matrix B may be decomposed as [3,22]

B ¼ OT1 OO2; ðB:7Þ

where O1;2AOðmÞ; the group of m m orthogonal matrices, and O ¼Diagðo1;y;omÞ; where o1;y;om are the singular values of B: Under thisdecomposition we may write the measure dB as [3,22]

dB ¼ dmðO1Þ dmðO2ÞYioj

jo2i o2

j jdmo; ðB:8Þ

where dmðO1;2Þ are Haar measures over the appropriate group manifolds. The

measure dB is manifestly invariant under actions of the orthogonal group OðmÞdB ¼ dðBOÞ ¼ dðO0BÞ; O;O0AOðmÞ; ðB:9Þ

as should have been expected to begin with.

Remark B.1. Note that the decomposition (B.7) is not unique, since O1D and DO2;with D being any of the 2m diagonal matrices Diagð71;y;71Þ; is an equally goodpair of orthogonal matrices to be used in (B.7). Thus, as O1 and O2 sweepindependently over the group OðmÞ; the measure (B.8) over counts B matrices. This

problem can be easily rectified by appropriately normalizing the volume Vm ¼RdmðO1Þ dmðO2Þ: One can show that the correct normalization of the volume is

Vm ¼ pmðmþ1Þ

2

2mQm

j¼1 Gð1þj2ÞGð j

2Þ: ðB:10Þ

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510502

Page 30: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

One simple way to establish (B.10), is to calculateZdB exp 1

2trBT B ¼ ð2pÞ

m2

2 ¼ Vm

ZN

N

dmoYioj

jo2i o2

j j exp 1

2

Xi

o2i :

The last integral is a known Selberg type integral [22].

The integrand in (B.4) depends on B only through the combination BT B ¼OT2 O

2O2: Thus, the integrations over O1 and O2 in (B.4) factor out trivially. Thus, we

end up with

cðyÞ ¼ Vmm

p

� �m2

2Z

N

N

Qioj jo2

i o2j jdmoffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

detð1þ y

O2Þq em tr O2

: ðB:11Þ

It is a straightforward exercise to check that (B.10) is consistent with cð0Þ ¼ 1:Note that in deriving (B.11) we have made no approximations. Up to this point,

all our considerations in this appendix were exact. We are interested in the large n;m

asymptotic behavior1 of PðuÞ and of its moments. Thus, we will now evaluate thelarge m behavior of cðyÞ (which is why we have chosen l ¼ m in (B.3)). Thisasymptotic behavior is determined by the saddle point dominating the integral overthe m singular values oi in (B.11) as m-N:To obtain this asymptotic behavior we rewrite the integrand in (B.11) as

eSffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidetð1þ y

O2Þq ;

where

S ¼ mXm

i¼1o2

i 1

2

Xioj

logðo2i o2

j Þ2: ðB:12Þ

In physical terms, S is the energy (or the action) of the so-called ‘‘Dyson gas’’ ofeigenvalues, familiar from the theory of random matrices.We look for a saddle point of the integral in (B.11) in which all the oi are of Oð1Þ:

In such a case, S in (B.12) is of Oðm2Þ; and thus eS overwhelms the factor

1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidetð1þ y

O2Þq ¼ e

m2

IðyÞ;

where

IðyÞ ¼ 1

m

Xm

i¼1log 1þ y

o2i

� �ðB:13Þ

is a quantity of Oðm0Þ: For later use, note thatIð0Þ ¼ 0: ðB:14Þ

ARTICLE IN PRESS

1Recall that m and n tend to infinity with the ratio (41), r ¼ m=n; kept finite.

A. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 503

Page 31: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

Thus, to leading order in 1=m; cðyÞ is dominated by the well defined and stablesaddle point of S; which is indeed the case.Simple arguments pertaining to the physics of the Dyson gas make it clear that the

saddle point is stable: The ‘‘confining potential’’ termP

i o2i in (B.12) tends to

condense all the oi at zero, while the ‘‘Coulomb repulsion’’ term P

ioj logðo2i

o2j Þ

2 acts to keep the joij apart. Equilibrium must be reached as a compromise, and

it must be stable, since the quadratic confining potential would eventually dominatethe logarithmic repulsive interaction for oi large enough. The saddle point equations

@S

@oi

¼ 2oi m Xjai

1

o2i o2

j

" #¼ 0; ðB:15Þ

are simply the equilibrium conditions between repulsive and attractive interactions,and thus determine the distribution of the joij:We will solve (B.15) (using standard techniques of random matrix theory), and

thus will determine the equilibrium configuration of the molecules of the Dyson gasin the next appendix, where we show that the m singular values oi condense (nonuniformly) into the finite segment (see Eq. (C.11))

0po2i p2

(and thus with mean spacing of the order of 1=m).To summarize, in the large m limit, cðyÞ is determined by the saddle point of the

energy S (B.12) of the Dyson gas. Thus for large m; according to (B.11)–(B.13),

cðyÞCV

2m

m

p

� �m2

2exp S� þ

m

2I�ðyÞ

� �;

where S� is the extremal value of (B.12), and I�ðyÞ is (B.13) evaluated at thatequilibrium configuration of the Dyson gas, namely,

I�ðyÞ ¼1

m

Xm

i¼1log 1þ y

o2i�

� �: ðB:16Þ

The actual value of S� (a number of Oðm2Þ) is of no special interest to us here,since from (B.5) and (B.14) we immediately deduce that in the large m limit

cðyÞCem2

I�ðyÞ: ðB:17Þ

Substituting (B.17) back into (B.3) we thus obtain the large ðn;mÞ behaviorof kN as

kNB1

GðNÞ

ZN

0

tN1etm2

I�ð tmÞ dt: ðB:18Þ

The function I�ðyÞ is evaluated in the next Appendix, and is given in Eq. (C.15),

I�ðyÞ ¼ y þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiy2 þ 2y

pþ logðy þ 1þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiy2 þ 2y

pÞ;

which we repeated here for convenience.

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510504

Page 32: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

The dominant contribution to the integral in (B.18) comes from values of t5m;since the function

fðtÞ ¼ t þ m

2I�

t

m

� �; ðB:19Þ

which appears in the exponent in (B.18) is monotonously increasing, as can be seenfrom (C.14). Thus, in this range of the variable t; using (C.16), we have

fðtÞ ¼ t þ m

2I�

t

m

� �¼ 2

ffiffiffiffiffiffiffiffi2mt

pþ t

2þ O

1ffiffiffiffim

p� �

: ðB:20Þ

Note that the term t=2 in (B.20) is beyond the accuracy of our approximation for I�:The reason is that in (C.12) we used the continuum approximation to the density ofsingular values, which introduced errors of the orders of 1=m: Fortunately, this termis not required. The leading order term in the exponential (B.20) of (B.18) is justffiffiffiffiffiffiffiffi2mt

p: Consequently, in the leading order (B.18) reduces to

kN �Z

N

0

du PðuÞuNC1

GðNÞ

ZN

0

tN1effiffiffiffiffiffi2mt

pdt

¼ 2Gð2NÞð2mÞNGðNÞ

¼ ð2N 1Þ!!mN

: ðB:21Þ

Moments (B.21) satisfy Carleman’s criterion [6,13]XNN¼1

k1=2NN ¼ N; ðB:22Þ

which is sufficient to guarantee that these moments define a unique distribution PðuÞ:Had we kept in (B.21) the Oðm0Þ piece of (B.20), i.e., the term t=2; it would have

produced a correction factor to (B.21) of the form 1þ OðN2=mÞ: To see this,consider the integral

1

GðNÞ

ZN

0

tN1effiffiffiffiffiffi2mt

pt=2 dt ¼ 2

ð2mÞNGðNÞ

ZN

0

y2N1eyy2=4m dy

C2

ð2mÞNGðNÞ

ZN

0

y2N1eyð1 y2=4m þ?Þ dy:

Thus, we can safely trust (B.21) for moments of order N5ffiffiffiffim

p:

The expression in (B.21) is readily recognized as the 2Nth moment of a Gaussiandistribution defined on the positive half-line. Indeed, the moments of the Gaussiandistribution

gðx; mÞ ¼ 2mffiffiffip

p em2x2 ; xX0 ðB:23Þ

are

/xkS ¼Gðkþ1

2Þffiffiffi

pp

mk: ðB:24Þ

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 505

Page 33: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

In particular, the even moments of (B.23) are

/x2NS ¼GðN þ 1

2Þffiffiffi

pp

m2N¼ ð2N 1Þ!!

ð2m2ÞN; ðB:25Þ

which coincide with (B.21) for 2m2 ¼ m: These are the moments of u ¼ x2 for the

distribution PðuÞ satisfying PðuÞ du ¼ gðx;ffiffiffiffiffiffiffiffiffim=2

pÞ dx; as can be seen comparing

(B.21) and (B.25).Thus, we conclude that the leading asymptotic behavior of PðuÞ as m tends to

infinity is

PðuÞ ¼ffiffiffiffiffiffiffiffim

2pu

re

mu2 ; ðB:26Þ

the result quoted in (42).As an additional check of this simple determination of PðuÞ from (B.21), we now

sketch how to derive it more formally from the function

GðzÞ ¼Z

N

0

PðuÞ du

z u; ðB:27Þ

known sometimes as the Stieltjes transform of PðuÞ [6]. GðzÞ is analytic in thecomplex z-plane, cut along the support of PðuÞ on the real axis. We can thendetermine PðuÞ from (B.27), once we have an explicit expression for GðzÞ; using theidentity

PðuÞ ¼ 1

pIm Gðu ieÞ: ðB:28Þ

For z large and off the real axis, and if all the moments of PðuÞ exist, we can formallyexpand GðzÞ in inverse powers of z: Thus,

GðzÞ ¼XNN¼0

ZN

0

PðuÞuN du

zNþ1 ¼XNN¼0

kN

zNþ1: ðB:29Þ

For the kN ’s given by (B.21), the series (B.29) diverges. However, it is Borelsummable [6]. Borel resummation of (B.21), making use of

1ffiffiffiffiffiffiffiffiffiffiffi1 x

p ¼ 1þXNN¼1

ð2N 1Þ!!N!

x

2

� �N

;

yields

GðzÞ ¼ 1

z

ZN

0

et dtffiffiffiffiffiffiffiffiffiffiffiffiffi1 2t

mz

q : ðB:30Þ

Thus,

1

pIm Gðu ieÞ ¼ 1

pu

ZN

mu2

et dtffiffiffiffiffiffiffiffiffiffiffiffiffi2tmu

1q ¼

ffiffiffiffiffiffiffiffim

2pu

re

mu2 ; ðB:31Þ

which coincides with (B.26).

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510506

Page 34: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

Appendix C. The saddle point distribution of the oi

We present in this Appendix the solution of the equilibrium condition (B.15) of theDyson gas of singular values

@S

@o2i

¼ m Xjai

1

o2i o2

j

¼ 0 ðC:1Þ

(which we repeated here for convenience), and then use it to calculate I�ðyÞ; definedin (B.16). We follow standard methods [3,9] of random matrix theory [22]. Let

si ¼ o2i ; ðC:2Þ

and also define

FðwÞ ¼ 1

m

Xm

i¼1

1

w si

� �¼ 1

mtr

1

w BT B

� �; ðC:3Þ

where w is a complex variable. Here the angular brackets denote averaging withrespect to the B sector of (13). By definition, FðwÞ behaves asymptotically as

FðwÞw-N��! 1

w: ðC:4Þ

It is clear from (C.3) that for s40; e-0þ we have

Fðs ieÞ ¼ 1

mP:P:

Xm

i¼1

1

s si

� �þ ip

m

Xm

i¼1/dðs siÞS; ðC:5Þ

where P:P: stands for the principal part. Therefore (from (C.3)), the average

eigenvalue density of BT B is given by

rðsÞ � 1

m

Xm

i¼1/dðs siÞS ¼ 1

pIm Fðs ieÞ: ðC:6Þ

In the large m limit, the real part of (C.5) is fixed by (C.1), namely, setting s ¼ si;

Re Fðs ieÞ � 1

m

Xj

1

s sj

* +¼ 1: ðC:7Þ

From the discussion of physical equilibrium of the Dyson gas (see the paragraphpreceding (B.15)), we expect the fsig to be contained in a single finite segment0pspa; with a yet to be determined. This means that FðwÞ should have a cut (alongthe real axis, where the eigenvalues of BT B are found) connecting w ¼ 0 and a:Furthermore, rðsÞ must be integrable as s-0þ; since a macroscopic number (i.e., afinite fraction of m) of eigenvalues cannot condense at s ¼ 0; due to repulsion. Theseconsiderations, together with (C.7) lead [3,9] to the reasonable ansatz

FðwÞ ¼ 1þ p

wþ q

� � ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiwðw aÞ

p; ðC:8Þ

with parameters p and q: The asymptotic behavior (C.4) then immediately fixes

q ¼ 0; p ¼ 1; and a ¼ 2: ðC:9Þ

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 507

Page 35: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

Thus,

FðwÞ ¼ 1ffiffiffiffiffiffiffiffiffiffiffiffiw 2

w

r: ðC:10Þ

The eigenvalue distribution of BT B is therefore

rðsÞ ¼ 1

pIm Fðs ieÞ ¼ 1

p

ffiffiffiffiffiffiffiffiffiffiffi2 s

s

rðC:11Þ

for 0oso2; and zero elsewhere. As a simple check, note thatZ 2

0

rðsÞ ds ¼ 1;

as guaranteed by the unit numerator in (C.14).

Thus, as mentioned in the previous appendix, o2i ; the eigenvalues of BT B; are

confined in a finite segment 0oso2: In the limit m-N; they form a continuouscondensate in this segment, with non uniform distribution (C.11).In an obvious manner, we can calculate S�; the extremal value of S in (B.12), by

replacing the discrete sums over the si by continuous integrals with weights rðsÞgiven by (C.11). We do not calculate S� explicitly, but merely mention the obvious

result that it is a number of Oðm2Þ: Similarly, from (B.16) and (C.11) we obtain

I�ðyÞ ¼Z 2

0

rðsÞ log 1þ y

s

� �ds ¼ 1

p

Z 2

0

ffiffiffiffiffiffiffiffiffiffiffi2 s

s

rlog 1þ y

s

� �ds: ðC:12Þ

Since the continuum approximation for rðsÞ introduces an error of the order 1=m; an

error of similar order is introduced in I�: It is easier to evaluatedI�ðyÞ

dy; and then

integrate back, to obtain I�ðyÞ: We find from (C.12)

dI�ðyÞdy

¼ FðyÞ ¼ 1þ y þ 2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiy2 þ 2y

p ¼ 1þffiffiffiffiffiffiffiffiffiffiffi1þ 2

y

s: ðC:13Þ

It is clear from the last equality in (C.13) that

dI�ðyÞdy

40 ðC:14Þ

for y40: Integrating (C.13), and using (B.14), I�ð0Þ ¼ 0; to determine the integrationconstant, we finally obtain

I�ðyÞ ¼ y þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiy2 þ 2y

pþ logðy þ 1þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiy2 þ 2y

pÞ: ðC:15Þ

From (C.15) we obtain the limiting behaviors

I�ðyÞ ¼ 2ffiffiffiffiffi2y

p y þ Oðy3=2Þ; 0py51; ðC:16Þ

and

I�ðyÞ ¼ log2y

e

� �þ O

1

y

� �; yb1: ðC:17Þ

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510508

Page 36: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

Due to (C.13), I�ðyÞ increases monotonically from I�ð0Þ ¼ 0 to its asymptotic form(C.17). Note that for y ¼ t=m (as required in (B.18)), the second term in (C.16) isOð1=mÞ and therefore it is beyond the accuracy of the approximation of this section.

References

[1] E. Abrahams, P.W. Anderson, D.C. Licciardelo, T.V. Ramakrishnan, Scaling theory of localization:

absence of quantum diffusion in two dimensions, Phys. Rev. Lett. 42 (1979) 673;

E. Abrahams, P.W. Anderson, D.S. Fisher, D.J. Thouless, New method for a scaling theory of

localization, Phys. Rev. B 22 (1980) 3519;

P.W. Anderson, New method for scaling theory of localization II Multichannel theory of a ‘‘wire’’

and possible extension to higher dimensionality, Phys. Rev. B 23 (1981) 4828.

[2] B.L. Altshuler, V.E. Kravtsov, I.V. Lerner, in: B.L. Altshuler, P.A. Lee, R.A. Webb (Eds.),

Mesoscopic Phenomena in Solids, North-Holland, Amsterdam, 1991.

[3] A. Anderson, R.C. Myers, V. Periwal, Complex random surfaces, Phys. Lett. B 254 (1991) 89;

A. Anderson, R.C. Myers, V. Periwal, Branched polymers from a double scaling limit of matrix

models, Nucl. Phys. B 360 (1991) 463 (Section 3);

J. Feinberg, A. Zee, Renormalizing rectangles and other topics in random matrix theory, J. Statist.

Mech. 87 (1997) 473–504;

G.M. Cicuta, L. Molinari, E. Montaldi, F. Riva, Large rectangular random matrices, J. Math. Phys.

28 (1987) 1716.

[4] K.M. Anstreicher, J. Ji, F.A. Potra, Y. Ye, Probabilistic analysis of an infeasible interior-point

algorithm for linear programming, Math. Oper. Res. 24 (1999) 176–192.

[5] A. Ben-Hur, H.T. Siegelmann, S. Fishman, A theory of complexity for continuous time dynamics, J.

Complexity 18 (2002) 51–86.

[6] C.M. Bender, S.A. Orszag, Advanced Mathematical Methods for Scientists and Engineers, 2nd

Edition, Springer-Verlag, New York, 1999 (Chapter 8).

[7] L. Blum, F. Cucker, M. Shub, S. Smale, Complexity and Real Computation, Springer-Verlag,

London, 1999.

[8] M.S. Branicky, Analog computation with continuous ODEs, in: Proceedings of the IEEE Workshop

on Physics and Computation, Dallas, TX, 1994, pp. 265–274.

[9] E. Brezin, C. Itzykson, G. Parisi, J.-B. Zuber, Planar diagrams, Comm. Math. Phys. 59 (1978) 35.

[10] R.W. Brockett, Dynamical systems that sort lists, diagonalize matrices and solve linear programming

problems, Linear Algebra Appl. 146 (1991) 79–91.

[11] L.O. Chua, G.N. Lin, Nonlinear programming without computation, IEEE Trans. Circuits Systems

31 (2) (1984) 182–188.

[12] A. Cichocki, R. Unbehauen, Neural Networks for Optimization and Signal Processing, John Wiley,

New York, 1993.

[13] R. Durrett, Probability: Theory and Examples, 2nd Edition, Wadswarth Publishing Co., Belmont,

1996 (Chapter 2).

[14] L. Faybusovich, Dynamical systems which solve optimization problems with linear constraints, IMA

J. Math. Control Inform. 8 (1991) 135–149.

[15] J. Feinberg, On the universality of the probability distribution of the product B1X of random

matrices, arXiv:math.PR/0204312, 2002.

[16] V.L. Girko, On the distribution of solutions of systems of linear equations with random coefficients,

Theory Probab. Math. Statist. 2 (1974) 41–44.

[17] B.V. Gnedenko, A.N. Kolmogorov, Limit Distributions for Sums of Independent Random Variables,

Addison-Wesley, Reading, MA, 1954.

[18] U. Helmke, J.B. Moore, Optimization and Dynamical Systems, Springer-Verlag, London, 1994.

[19] J. Hertz, A. Krogh, R. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley,

Redwood City, 1991.

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 509

Page 37: Probabilisticanalysisofadifferentialequation ...1. Introduction Inrecentyearsscientistshavedevelopednewapproachestocomputation,someof thembasedoncontinuoustimeanalogsystems.AnalogVLSIdevices,thatareoften

[20] X.B. Liang, J. Wang, A recurrent neural network for nonlinear optimization with a continuously

differentiable objective function and bound constraints, IEEE Trans. Neural Networks 11 (2000)

1251–1262.

[21] C. Mead, Analog VLSI and Neural Systems, Addison-Wesley, Reading, MA, 1989.

[22] M.L. Mehta, Random Matrices, 2nd Edition, Academic Press, Boston, 1991.

[23] S. Mizuno, M.J. Todd, Y. Ye, On adaptive-step primal-dual interior-point algorithms for linear

programming, Math. Oper. Res. 18 (1993) 964–981.

[24] C. Papadimitriou, Computational Complexity, Addison-Wesley, Reading, MA, 1995.

[25] J. Renegar, Incorporating condition measures into the complexity theory of linear programming,

SIAM J. Optim. 5 (3) (1995) 506–524.

[26] R. Saigal, Linear Programming, Kluwer Academic, Dordrecht, 1995.

[27] R. Shamir, The efficiency of the simplex method: a survey, Manage. Sci. 33 (3) (1987) 301–334.

[28] H.T. Siegelmann, S. Fishman, Computation by dynamical systems, Physica D 120 (1998) 214–235.

[29] S. Smale, On the average number of steps in the simplex method of linear programming, Math.

Programming 27 (1983) 241–262.

[30] M.J. Todd, Probabilistic models for linear programming, Math. Oper. Res. 16 (1991) 671–693.

[31] J.F. Traub, H. Wozniakowski, Complexity of linear programming, Oper. Res. Lett. 1 (1982) 59–62.

[32] K.G. Wilson, J. Kogut, The renormalization group and the epsilon expansion, Phys. Rep. 12

(1974) 75;

J. Cardy, Scaling and Renormalization in Statistical Physics, Cambridge University Press,

Cambridge, 1996.

[33] Y. Ye, Interior Point Algorithms: Theory and Analysis, John Wiley and Sons Inc., New York, 1997.

[34] Y. Ye, Toward probabilistic analysis of interior-point algorithms for linear programming, Math.

Oper. Res. 19 (1994) 38–52.

ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510510